[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2022-09-20 Thread Yun Gao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606936#comment-17606936
 ] 

Yun Gao commented on FLINK-951:
---

Since now we have had new iteration library in the flink-ml repo to fit into 
the situation of unification of batch / streaming processing, I'll close this 
issue. 

> Reworking of Iteration Synchronization, Accumulators and Aggregators
> 
>
> Key: FLINK-951
> URL: https://issues.apache.org/jira/browse/FLINK-951
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet, Runtime / Task
>Affects Versions: 0.9
>Reporter: Markus Holzemer
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> auto-unassigned, refactoring
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I just realized that there is no real Jira issue for the task I am currently 
> working on. 
> I am currently reworking a few things regarding Iteration Synchronization, 
> Accumulators and Aggregators. Currently the synchronization at the end of one 
> superstep is done through channel events. That makes it hard to track the 
> current status of iterations. That is why I am changing this synchronization 
> to use RPC calls with the JobManager, so that the JobManager manages the 
> current status of all iterations.
> Currently we use Accumulators outside of iterations and Aggregators inside of 
> iterations. Both have a similiar function, but a bit different interfaces and 
> handling. I want to unify these two concepts. I propose that we stick in the 
> future to Accumulators only. Aggregators therefore are removed and 
> Accumulators are extended to cover the usecases Aggregators were used fore 
> before. The switch to RPC for iterations makes it possible to also send the 
> current Accumulator values at the end of each superstep, so that the 
> JobManager (and thereby the webinterface) will be able to print intermediate 
> accumulation results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2021-04-27 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334339#comment-17334339
 ] 

Flink Jira Bot commented on FLINK-951:
--

This issue was marked "stale-assigned" and has not received an update in 7 
days. It is now automatically unassigned. If you are still working on it, you 
can assign it to yourself again. Please also give an update about the status of 
the work.

> Reworking of Iteration Synchronization, Accumulators and Aggregators
> 
>
> Key: FLINK-951
> URL: https://issues.apache.org/jira/browse/FLINK-951
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet, Runtime / Task
>Affects Versions: 0.9
>Reporter: Markus Holzemer
>Assignee: Markus Holzemer
>Priority: Major
>  Labels: refactoring, stale-assigned
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I just realized that there is no real Jira issue for the task I am currently 
> working on. 
> I am currently reworking a few things regarding Iteration Synchronization, 
> Accumulators and Aggregators. Currently the synchronization at the end of one 
> superstep is done through channel events. That makes it hard to track the 
> current status of iterations. That is why I am changing this synchronization 
> to use RPC calls with the JobManager, so that the JobManager manages the 
> current status of all iterations.
> Currently we use Accumulators outside of iterations and Aggregators inside of 
> iterations. Both have a similiar function, but a bit different interfaces and 
> handling. I want to unify these two concepts. I propose that we stick in the 
> future to Accumulators only. Aggregators therefore are removed and 
> Accumulators are extended to cover the usecases Aggregators were used fore 
> before. The switch to RPC for iterations makes it possible to also send the 
> current Accumulator values at the end of each superstep, so that the 
> JobManager (and thereby the webinterface) will be able to print intermediate 
> accumulation results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2021-04-16 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323746#comment-17323746
 ] 

Flink Jira Bot commented on FLINK-951:
--

This issue is assigned but has not received an update in 7 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Reworking of Iteration Synchronization, Accumulators and Aggregators
> 
>
> Key: FLINK-951
> URL: https://issues.apache.org/jira/browse/FLINK-951
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet, Runtime / Task
>Affects Versions: 0.9
>Reporter: Markus Holzemer
>Assignee: Markus Holzemer
>Priority: Major
>  Labels: refactoring, stale-assigned
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I just realized that there is no real Jira issue for the task I am currently 
> working on. 
> I am currently reworking a few things regarding Iteration Synchronization, 
> Accumulators and Aggregators. Currently the synchronization at the end of one 
> superstep is done through channel events. That makes it hard to track the 
> current status of iterations. That is why I am changing this synchronization 
> to use RPC calls with the JobManager, so that the JobManager manages the 
> current status of all iterations.
> Currently we use Accumulators outside of iterations and Aggregators inside of 
> iterations. Both have a similiar function, but a bit different interfaces and 
> handling. I want to unify these two concepts. I propose that we stick in the 
> future to Accumulators only. Aggregators therefore are removed and 
> Accumulators are extended to cover the usecases Aggregators were used fore 
> before. The switch to RPC for iterations makes it possible to also send the 
> current Accumulator values at the end of each superstep, so that the 
> JobManager (and thereby the webinterface) will be able to print intermediate 
> accumulation results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2016-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258599#comment-15258599
 ] 

ASF GitHub Bot commented on FLINK-951:
--

Github user markus-h closed the pull request at:

https://github.com/apache/flink/pull/570


> Reworking of Iteration Synchronization, Accumulators and Aggregators
> 
>
> Key: FLINK-951
> URL: https://issues.apache.org/jira/browse/FLINK-951
> Project: Flink
>  Issue Type: Improvement
>  Components: Iterations, Optimizer
>Affects Versions: 0.9
>Reporter: Markus Holzemer
>Assignee: Markus Holzemer
>  Labels: refactoring
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I just realized that there is no real Jira issue for the task I am currently 
> working on. 
> I am currently reworking a few things regarding Iteration Synchronization, 
> Accumulators and Aggregators. Currently the synchronization at the end of one 
> superstep is done through channel events. That makes it hard to track the 
> current status of iterations. That is why I am changing this synchronization 
> to use RPC calls with the JobManager, so that the JobManager manages the 
> current status of all iterations.
> Currently we use Accumulators outside of iterations and Aggregators inside of 
> iterations. Both have a similiar function, but a bit different interfaces and 
> handling. I want to unify these two concepts. I propose that we stick in the 
> future to Accumulators only. Aggregators therefore are removed and 
> Accumulators are extended to cover the usecases Aggregators were used fore 
> before. The switch to RPC for iterations makes it possible to also send the 
> current Accumulator values at the end of each superstep, so that the 
> JobManager (and thereby the webinterface) will be able to print intermediate 
> accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2016-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258598#comment-15258598
 ] 

ASF GitHub Bot commented on FLINK-951:
--

Github user markus-h commented on the pull request:

https://github.com/apache/flink/pull/570#issuecomment-214833775
  
Sorry for not driving this further. I think this pull request is now way 
too outdated to have any chance of rebasing it to the current master, therefore 
I will close it.


> Reworking of Iteration Synchronization, Accumulators and Aggregators
> 
>
> Key: FLINK-951
> URL: https://issues.apache.org/jira/browse/FLINK-951
> Project: Flink
>  Issue Type: Improvement
>  Components: Iterations, Optimizer
>Affects Versions: 0.9
>Reporter: Markus Holzemer
>Assignee: Markus Holzemer
>  Labels: refactoring
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I just realized that there is no real Jira issue for the task I am currently 
> working on. 
> I am currently reworking a few things regarding Iteration Synchronization, 
> Accumulators and Aggregators. Currently the synchronization at the end of one 
> superstep is done through channel events. That makes it hard to track the 
> current status of iterations. That is why I am changing this synchronization 
> to use RPC calls with the JobManager, so that the JobManager manages the 
> current status of all iterations.
> Currently we use Accumulators outside of iterations and Aggregators inside of 
> iterations. Both have a similiar function, but a bit different interfaces and 
> handling. I want to unify these two concepts. I propose that we stick in the 
> future to Accumulators only. Aggregators therefore are removed and 
> Accumulators are extended to cover the usecases Aggregators were used fore 
> before. The switch to RPC for iterations makes it possible to also send the 
> current Accumulator values at the end of each superstep, so that the 
> JobManager (and thereby the webinterface) will be able to print intermediate 
> accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538269#comment-14538269
 ] 

ASF GitHub Bot commented on FLINK-951:
--

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/570#issuecomment-100996059
  
I had a look at the pull request and I like very much what it tries to do.

The problem right now is that I can hardly say without investing a lot of 
time whether this is in good shape to merge. This pull request does a at least 
two very big things at the same time:
 - Move iteration synchronization to the JobManager
 - Unify aggregators and accumulators into one.

With all the example / testcase adjustments, this becomes a lot to review. 
The description of the pull request also does not make it easy, since many 
questions and decisions that arise are not explained:
 - What interface do the unified aggregators/accumulators follow: The 
aggregators, or the accumulators.
 - How is the blocking superstep synchronization currently done. With actor 
ask?
 - How is the aggregator/accumulator unification achieved, when aggregators 
are created per superstep, and accumulators once?

This is a lot for a very delicate and critical mechanism. I think if we 
want to merge this, we would need more details on how things were changed (what 
is the concept behind the changed, not just what are the code diffs).
We may need to break it into multiple self-contained changes that we can 
individually review and merge, to make sure that it gets properly checked and 
will work robustly.


 Reworking of Iteration Synchronization, Accumulators and Aggregators
 

 Key: FLINK-951
 URL: https://issues.apache.org/jira/browse/FLINK-951
 Project: Flink
  Issue Type: Improvement
  Components: Iterations, Optimizer
Affects Versions: 0.9
Reporter: Markus Holzemer
Assignee: Markus Holzemer
  Labels: refactoring
   Original Estimate: 168h
  Remaining Estimate: 168h

 I just realized that there is no real Jira issue for the task I am currently 
 working on. 
 I am currently reworking a few things regarding Iteration Synchronization, 
 Accumulators and Aggregators. Currently the synchronization at the end of one 
 superstep is done through channel events. That makes it hard to track the 
 current status of iterations. That is why I am changing this synchronization 
 to use RPC calls with the JobManager, so that the JobManager manages the 
 current status of all iterations.
 Currently we use Accumulators outside of iterations and Aggregators inside of 
 iterations. Both have a similiar function, but a bit different interfaces and 
 handling. I want to unify these two concepts. I propose that we stick in the 
 future to Accumulators only. Aggregators therefore are removed and 
 Accumulators are extended to cover the usecases Aggregators were used fore 
 before. The switch to RPC for iterations makes it possible to also send the 
 current Accumulator values at the end of each superstep, so that the 
 JobManager (and thereby the webinterface) will be able to print intermediate 
 accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2015-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532355#comment-14532355
 ] 

ASF GitHub Bot commented on FLINK-951:
--

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/570#issuecomment-99797733
  
Hey @markus-h, sorry for not reacting to your pull request (other changes 
also sometimes need a lot of time until they are merged)
I think @StephanEwen would be the best reviewer for this pull request. He 
is currently at Strata in London. Maybe he has time to take a closer look here 
once he is back.


 Reworking of Iteration Synchronization, Accumulators and Aggregators
 

 Key: FLINK-951
 URL: https://issues.apache.org/jira/browse/FLINK-951
 Project: Flink
  Issue Type: Improvement
  Components: Iterations, Optimizer
Affects Versions: 0.9
Reporter: Markus Holzemer
Assignee: Markus Holzemer
  Labels: refactoring
   Original Estimate: 168h
  Remaining Estimate: 168h

 I just realized that there is no real Jira issue for the task I am currently 
 working on. 
 I am currently reworking a few things regarding Iteration Synchronization, 
 Accumulators and Aggregators. Currently the synchronization at the end of one 
 superstep is done through channel events. That makes it hard to track the 
 current status of iterations. That is why I am changing this synchronization 
 to use RPC calls with the JobManager, so that the JobManager manages the 
 current status of all iterations.
 Currently we use Accumulators outside of iterations and Aggregators inside of 
 iterations. Both have a similiar function, but a bit different interfaces and 
 handling. I want to unify these two concepts. I propose that we stick in the 
 future to Accumulators only. Aggregators therefore are removed and 
 Accumulators are extended to cover the usecases Aggregators were used fore 
 before. The switch to RPC for iterations makes it possible to also send the 
 current Accumulator values at the end of each superstep, so that the 
 JobManager (and thereby the webinterface) will be able to print intermediate 
 accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2015-04-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485015#comment-14485015
 ] 

ASF GitHub Bot commented on FLINK-951:
--

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/570#issuecomment-90865992
  
Have the latest commits fixed it? :)


 Reworking of Iteration Synchronization, Accumulators and Aggregators
 

 Key: FLINK-951
 URL: https://issues.apache.org/jira/browse/FLINK-951
 Project: Flink
  Issue Type: Improvement
  Components: Iterations, Optimizer
Affects Versions: 0.9
Reporter: Markus Holzemer
Assignee: Markus Holzemer
  Labels: refactoring
   Original Estimate: 168h
  Remaining Estimate: 168h

 I just realized that there is no real Jira issue for the task I am currently 
 working on. 
 I am currently reworking a few things regarding Iteration Synchronization, 
 Accumulators and Aggregators. Currently the synchronization at the end of one 
 superstep is done through channel events. That makes it hard to track the 
 current status of iterations. That is why I am changing this synchronization 
 to use RPC calls with the JobManager, so that the JobManager manages the 
 current status of all iterations.
 Currently we use Accumulators outside of iterations and Aggregators inside of 
 iterations. Both have a similiar function, but a bit different interfaces and 
 handling. I want to unify these two concepts. I propose that we stick in the 
 future to Accumulators only. Aggregators therefore are removed and 
 Accumulators are extended to cover the usecases Aggregators were used fore 
 before. The switch to RPC for iterations makes it possible to also send the 
 current Accumulator values at the end of each superstep, so that the 
 JobManager (and thereby the webinterface) will be able to print intermediate 
 accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2015-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483752#comment-14483752
 ] 

ASF GitHub Bot commented on FLINK-951:
--

Github user markus-h commented on the pull request:

https://github.com/apache/flink/pull/570#issuecomment-90694363
  
There seems to be a race condition somewhere in my code but I have trouble 
finding it since I can not reproduce it locally. I thought my last change would 
fix it but it didn't.
So if somebody has some free time and knows a bit about race conditions 
feel free to help me :-)


 Reworking of Iteration Synchronization, Accumulators and Aggregators
 

 Key: FLINK-951
 URL: https://issues.apache.org/jira/browse/FLINK-951
 Project: Flink
  Issue Type: Improvement
  Components: Iterations, Optimizer
Affects Versions: 0.9
Reporter: Markus Holzemer
Assignee: Markus Holzemer
  Labels: refactoring
   Original Estimate: 168h
  Remaining Estimate: 168h

 I just realized that there is no real Jira issue for the task I am currently 
 working on. 
 I am currently reworking a few things regarding Iteration Synchronization, 
 Accumulators and Aggregators. Currently the synchronization at the end of one 
 superstep is done through channel events. That makes it hard to track the 
 current status of iterations. That is why I am changing this synchronization 
 to use RPC calls with the JobManager, so that the JobManager manages the 
 current status of all iterations.
 Currently we use Accumulators outside of iterations and Aggregators inside of 
 iterations. Both have a similiar function, but a bit different interfaces and 
 handling. I want to unify these two concepts. I propose that we stick in the 
 future to Accumulators only. Aggregators therefore are removed and 
 Accumulators are extended to cover the usecases Aggregators were used fore 
 before. The switch to RPC for iterations makes it possible to also send the 
 current Accumulator values at the end of each superstep, so that the 
 JobManager (and thereby the webinterface) will be able to print intermediate 
 accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2015-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14396386#comment-14396386
 ] 

ASF GitHub Bot commented on FLINK-951:
--

Github user markus-h commented on the pull request:

https://github.com/apache/flink/pull/570#issuecomment-89849730
  
Thanks for your comments!
I will try to revert my formattings. I am used to doing STRG+F while 
programming that probably changed the formatting.

I also got rid of the Thread.sleep(). The problem was acutally a different 
one. Akka delivered the same response object to threads on the same machine, 
but I thought it would be copies. Now I do a hard copy of the response, that 
seems to fix the problem.

I am not sure how to test the interaction between IterationHead and JM 
though. Is there some similiar testcase that I could use as a basis?



 Reworking of Iteration Synchronization, Accumulators and Aggregators
 

 Key: FLINK-951
 URL: https://issues.apache.org/jira/browse/FLINK-951
 Project: Flink
  Issue Type: Improvement
  Components: Iterations, Optimizer
Affects Versions: 0.9
Reporter: Markus Holzemer
Assignee: Markus Holzemer
  Labels: refactoring
   Original Estimate: 168h
  Remaining Estimate: 168h

 I just realized that there is no real Jira issue for the task I am currently 
 working on. 
 I am currently reworking a few things regarding Iteration Synchronization, 
 Accumulators and Aggregators. Currently the synchronization at the end of one 
 superstep is done through channel events. That makes it hard to track the 
 current status of iterations. That is why I am changing this synchronization 
 to use RPC calls with the JobManager, so that the JobManager manages the 
 current status of all iterations.
 Currently we use Accumulators outside of iterations and Aggregators inside of 
 iterations. Both have a similiar function, but a bit different interfaces and 
 handling. I want to unify these two concepts. I propose that we stick in the 
 future to Accumulators only. Aggregators therefore are removed and 
 Accumulators are extended to cover the usecases Aggregators were used fore 
 before. The switch to RPC for iterations makes it possible to also send the 
 current Accumulator values at the end of each superstep, so that the 
 JobManager (and thereby the webinterface) will be able to print intermediate 
 accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2015-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14396350#comment-14396350
 ] 

ASF GitHub Bot commented on FLINK-951:
--

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/570#issuecomment-89823498
  
Nice to see that you picked this up again. :-) I know from experience that 
it can be tricky to port the old RPC calls to Akka msgs, so kudos. ;-)

- I actually agree with your reformatings, but we had a discussion recently 
(http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Issues-with-heterogeneity-of-the-code-td4292.html#a4443)
 to refrain from reformattings. I think it makes it harder to review your 
changes. We haven't really written this down somewhere (this is actually a good 
reminder that we need to do this), but you should keep it in mind.

- I only had a quick look at the hack you pointed out. I think we should 
*not* merge it in the current state (in any case I would vote to postpone 
merging this until *after* the upcoming milestone release). Can you provide 
more information about what kind of Exception is thrown? Waiting for 10 ms is 
not robust against different timings on different machines.

I think the blocking Await is OK in this case. We should add tests for the 
basic JM - IterationHeadTask interaction though.


 Reworking of Iteration Synchronization, Accumulators and Aggregators
 

 Key: FLINK-951
 URL: https://issues.apache.org/jira/browse/FLINK-951
 Project: Flink
  Issue Type: Improvement
  Components: Iterations, Optimizer
Affects Versions: 0.9
Reporter: Markus Holzemer
Assignee: Markus Holzemer
  Labels: refactoring
   Original Estimate: 168h
  Remaining Estimate: 168h

 I just realized that there is no real Jira issue for the task I am currently 
 working on. 
 I am currently reworking a few things regarding Iteration Synchronization, 
 Accumulators and Aggregators. Currently the synchronization at the end of one 
 superstep is done through channel events. That makes it hard to track the 
 current status of iterations. That is why I am changing this synchronization 
 to use RPC calls with the JobManager, so that the JobManager manages the 
 current status of all iterations.
 Currently we use Accumulators outside of iterations and Aggregators inside of 
 iterations. Both have a similiar function, but a bit different interfaces and 
 handling. I want to unify these two concepts. I propose that we stick in the 
 future to Accumulators only. Aggregators therefore are removed and 
 Accumulators are extended to cover the usecases Aggregators were used fore 
 before. The switch to RPC for iterations makes it possible to also send the 
 current Accumulator values at the end of each superstep, so that the 
 JobManager (and thereby the webinterface) will be able to print intermediate 
 accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2015-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395628#comment-14395628
 ] 

ASF GitHub Bot commented on FLINK-951:
--

GitHub user markus-h opened a pull request:

https://github.com/apache/flink/pull/570

[FLINK-951] Reworking of Iteration Synchronization, Accumulators and 
Aggregators

Iteration synchronization through JobManager
Unification of Accumulators and Aggregators (removal of former Aggregators)
Adjusted testcases accordingly

I redid the work of my very old pull request 
https://github.com/apache/flink/pull/36
A more detailed description can be found in jira 
https://issues.apache.org/jira/browse/FLINK-951

I came across some unexpected behaviour with akka that made a small hack 
neccessary. Perhaps somebody with more experience in akka can find a better 
solution. See IterationHeadPactTask line 392.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markus-h/incubator-flink 
iterationsAndAccumulatorsRework2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #570


commit 5492487892ff99f10fccdb075404dedaa3371ff7
Author: Markus Holzemer markus.holze...@gmx.de
Date:   2015-04-02T15:56:19Z

Iteration synchronization through JobManager
Unification of Accumulators and Aggregators (removal of former Aggregators)
Adjusted testcases accordingly




 Reworking of Iteration Synchronization, Accumulators and Aggregators
 

 Key: FLINK-951
 URL: https://issues.apache.org/jira/browse/FLINK-951
 Project: Flink
  Issue Type: Improvement
  Components: Iterations, Optimizer
Affects Versions: 0.6-incubating
Reporter: Markus Holzemer
Assignee: Markus Holzemer
  Labels: refactoring
   Original Estimate: 168h
  Remaining Estimate: 168h

 I just realized that there is no real Jira issue for the task I am currently 
 working on. 
 I am currently reworking a few things regarding Iteration Synchronization, 
 Accumulators and Aggregators. Currently the synchronization at the end of one 
 superstep is done through channel events. That makes it hard to track the 
 current status of iterations. That is why I am changing this synchronization 
 to use RPC calls with the JobManager, so that the JobManager manages the 
 current status of all iterations.
 Currently we use Accumulators outside of iterations and Aggregators inside of 
 iterations. Both have a similiar function, but a bit different interfaces and 
 handling. I want to unify these two concepts. I propose that we stick in the 
 future to Accumulators only. Aggregators therefore are removed and 
 Accumulators are extended to cover the usecases Aggregators were used fore 
 before. The switch to RPC for iterations makes it possible to also send the 
 current Accumulator values at the end of each superstep, so that the 
 JobManager (and thereby the webinterface) will be able to print intermediate 
 accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)