[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2017-04-14 Thread Tushar Gosavi (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968777#comment-15968777
 ] 

Tushar Gosavi commented on APEXCORE-408:


I have updated the document, please go through it. The approach I am taking is 
summarized below

- extend dag interface with following methods
  - removeOperator/removeStream
- add startTransaction method on LogicalPlan which will return a DAG object, 
which is a instance of TransactionalDAG which is clone of the dag with 
reference to operator and ports from the original dag.
  TransactionDAG maintains the reference to original dag and has a additional 
commit method. On commit  if the new DAG is valid then the same changes are 
applied to the original DAG.
- user can change the cloned dag as per his requirement. Currently I am 
providing the hooks to change the DAG through StatsListener by extending 
StatsListener interface having additional context object. later the hooks could 
also be provided through  DAGExecutionPluginContext which has access to more 
information from stram than statslistener.
- A Commit handler interface is added which commit method is call to notify 
type of changes performed to the DAG. The PhysicalPlan will be changed through 
this handler.


> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2017-04-10 Thread Pramod Immaneni (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963035#comment-15963035
 ] 

Pramod Immaneni commented on APEXCORE-408:
--

How about widening the scope to also trigger the changes when major events 
happen in the execution such as operators getting killed or choosing to 
shutdown? In general whenever a stram event gets created we could potentially 
trigger the changes.

> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2017-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960602#comment-15960602
 ] 

ASF GitHub Bot commented on APEXCORE-408:
-

Github user tushargosavi closed the pull request at:

https://github.com/apache/apex-core/pull/410


> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2017-04-06 Thread Tushar Gosavi (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958879#comment-15958879
 ] 

Tushar Gosavi commented on APEXCORE-408:


[~thw] I have opened a pull request 
https://github.com/apache/apex-core/pull/476 for statslistener  changes. I am 
implementing the second approach from the document as it would be simple and 
won't change existing code much. There are few problems I am facing regarding 
the validation done by LogicalPlan while elements are being added to it, like 
extending a stream the port needs to be present in the DAG. When I try to 
extend an existing stream with new operator's input port , as new operator's 
input port is not in the original DAG , it throws as error as port does not 
belong to any operator in the DAG. Trying to solve this problem.

> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2017-04-05 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958295#comment-15958295
 ] 

Thomas Weise commented on APEXCORE-408:
---

[~tushargosavi] any update wrt the comments in the document or the PR?

> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2017-03-07 Thread Tushar Gosavi (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899916#comment-15899916
 ] 

Tushar Gosavi commented on APEXCORE-408:


removed word document, and added link to shared google drive document. Can we 
make use of https://cwiki.apache.org/confluence/display/APEX/Apache+Apex+Home 
for enhancement proposals and design documents.

> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2017-03-07 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899550#comment-15899550
 ] 

Thomas Weise commented on APEXCORE-408:
---

Please remove the word documents and let's use a single drive document that 
will make it easier to collaborate. The final doc can be attached as PDF when 
the work is complete.

> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
> Attachments: dag.docx, dag.docx
>
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574913#comment-15574913
 ] 

ASF GitHub Bot commented on APEXCORE-408:
-

GitHub user tushargosavi opened a pull request:

https://github.com/apache/apex-core/pull/410

APEXCORE-408: Ability to schedule Sub-DAG from running application.

Pull request for dynamic dag modification through stats listener.  It 
provides following
functionality

- StatsListener can access the opearator name for easily detecting which 
opearator stats are being processed.
- StatsListener can create a instance of object through which it can submit 
dag modifications to the engine.
- StatsListener can return dag changes as a response to engine.
- PlanModifier is modified to take a DAG and apply it on the existing 
running DAG and deploy the changes.

The following functionality is not working yet.

- The new opearator does not start from the correct windowId 
(https://issues.apache.org/jira/browse/APEXCORE-532)
- Relanched application failed to start when it was killed after dynamic 
dag modification.
- There is no support for resuming operator from previous state when they 
were removed. This could be achived through
  readig state through external storage on setup.
- persist operator support is not present for newly added streams.
- Not all parts are covered through tests.

The demo application using the feature is available at
https://github.com/tushargosavi/apex-dynamic-scheduling

There are two variations of WordCount application. The first variation 
detects the presence of
new files start a disconnected DAG to process the data.

(https://github.com/tushargosavi/apex-dynamic-scheduling/blob/master/src/main/java/com/datatorrent/wordcount/WordCountApp.java)

The second application 
(https://github.com/tushargosavi/apex-dynamic-scheduling/blob/master/src/main/java/com/datatorrent/wordcount/ExtendApp.java),
initially only one reader operator is running in the DAG, and provides 
pendingFiles as auto-metric to stat listener.
On detecting pending files it attaches splitter counter and output operator 
to the read operator. Once files are processed the splitter, counter and
output operators are removed and added back again if new data files are 
added into the directory.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tushargosavi/incubator-apex-core APEXCORE-408

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/410.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #410


commit 5cecd029463de021469afb20650e3afbf75fd088
Author: Tushar R. Gosavi 
Date:   2016-08-08T10:24:41Z

APEXCORE-408: Ability to schedule Sub-DAG from running application.




> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
> Attachments: Dynamic DAG Changes.docx
>
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574910#comment-15574910
 ] 

ASF GitHub Bot commented on APEXCORE-408:
-

Github user tushargosavi closed the pull request at:

https://github.com/apache/apex-core/pull/393


> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
> Attachments: Dynamic DAG Changes.docx
>
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-408) Ability to schedule Sub-DAG from running application

2016-08-01 Thread Tushar Gosavi (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401883#comment-15401883
 ] 

Tushar Gosavi commented on APEXCORE-408:


I have dome some initial prototype which allows stat listener to
specify dag changes, and the dag changes are applied asynchronously.

The changes involved are
- Add DagChangeSet object which is inherited from DAG, supporting
methods to remove operator and streams.

- The stat listener will return this object in Response, and platform
will apply changes specified in response to the DAG.

The Apex changes
https://github.com/apache/apex-core/compare/master...tushargosavi:scheduler?expand=1

The correspondign Demo application, which one operator monitors the
directory for files, and launch the wordcount DAG in
same application master when files are available.
https://github.com/tushargosavi/apex-dynamic-scheduling/blob/master/src/main/java/com/datatorrent/wordcount/FileStatListenerSameDag.java

Let me know if this type of API is acceptable for modifying dag through java 
API.


> Ability to schedule Sub-DAG from running application
> 
>
> Key: APEXCORE-408
> URL: https://issues.apache.org/jira/browse/APEXCORE-408
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Thomas Weise
>Assignee: Tushar Gosavi
>
> Today it is possible to add operators to a running application from the 
> CLI/client. It should be possible to do this from within the application 
> also. This can be used to expand/remove a Sub-DAG on demand, triggered by 
> application specific logic. It will enable use cases such as batch 
> applications that perform multiple stages of processing where not all 
> resources are required at the same time or the logic of subsequent steps 
> depends on the execution of previous steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)