[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters
[ https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397915#comment-16397915 ] Eric Wohlstadter commented on TEZ-2161: --- [~gopalv] What do you think about using java.util.concurrent.atomic.LongAccumulator? Wouldn't that mean we can get rid of those final Counter aggregation locks? Although ... we'd need to package this in some kind of -Phadoop3 profile, because Tez still advertises support for Java 1.7. > Support CRDT aggregation models for counters > - > > Key: TEZ-2161 > URL: https://issues.apache.org/jira/browse/TEZ-2161 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Eric Wohlstadter >Priority: Major > > Some counters such as last event received time need to be handled different > to say bytes read counters. Bytes reads requires a summation across all tasks > within a vertex. The received time requires doing a max() across all the > tasks. First event received time would likely need a min(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters
[ https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397577#comment-16397577 ] Gopal V commented on TEZ-2161: -- Yes, move from Counter::increment to Counter::update as polymorphism. > Support CRDT aggregation models for counters > - > > Key: TEZ-2161 > URL: https://issues.apache.org/jira/browse/TEZ-2161 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Eric Wohlstadter >Priority: Major > > Some counters such as last event received time need to be handled different > to say bytes read counters. Bytes reads requires a summation across all tasks > within a vertex. The received time requires doing a max() across all the > tasks. First event received time would likely need a min(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters
[ https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397573#comment-16397573 ] Eric Wohlstadter commented on TEZ-2161: --- {quote}Adding a MAX_GC_MILLIS counter with new semantics explicitly is better than messing with the existing GC_MILLIS counter.{quote} Just to make sure I understand, you're suggesting to use something like polymorphism to "parameterize" the behavior of Counters. That way the particular aggregation logic is encapsulated in specific counter classes, instead of scattered throughout all the counter related classes. Is that right? > Support CRDT aggregation models for counters > - > > Key: TEZ-2161 > URL: https://issues.apache.org/jira/browse/TEZ-2161 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Eric Wohlstadter >Priority: Major > > Some counters such as last event received time need to be handled different > to say bytes read counters. Bytes reads requires a summation across all tasks > within a vertex. The received time requires doing a max() across all the > tasks. First event received time would likely need a min(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters
[ https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397546#comment-16397546 ] Gopal V commented on TEZ-2161: -- bq. DAG aggregates Vertices which aggregates Tasks which chooses "bestAttempt". And the whole thing runs in various locks. This getAllCounters() flow executes locally on the AM. CRDT was more for the P-N counter implementation for aggregates which store both the -ve and +ve movements of the counter. bq. My plan is to add "aggregateAllCounters" to the CounterGroup classes, which will be used similarly to "incrAllCounters", except instead of only doing SUM, it also does MIN, AVG, MAX. The Counter needs sub-classes which declare what it needs to aggregate on - adding fields to every counter will break everything downstream that exists today. Adding a MAX_GC_MILLIS counter with new semantics explicitly is better than messing with the existing GC_MILLIS counter. > Support CRDT aggregation models for counters > - > > Key: TEZ-2161 > URL: https://issues.apache.org/jira/browse/TEZ-2161 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Eric Wohlstadter >Priority: Major > > Some counters such as last event received time need to be handled different > to say bytes read counters. Bytes reads requires a summation across all tasks > within a vertex. The received time requires doing a max() across all the > tasks. First event received time would likely need a min(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters
[ https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397524#comment-16397524 ] Eric Wohlstadter commented on TEZ-2161: --- [~gopalv] I'm not planning to do an actual CRDT implementation for this ticket. I do think that would be really valuable (I have used Akka's CRDT in a previous project), but it's not needed for the use-case where counters are aggregated in the getAllCounters() flow: i.e. DAG aggregates Vertices which aggregates Tasks which chooses "bestAttempt". And the whole thing runs in various locks. This getAllCounters() flow executes locally on the AM. My plan is to add "aggregateAllCounters" to the CounterGroup classes, which will be used similarly to "incrAllCounters", except instead of only doing SUM, it also does MIN, AVG, MAX. Since MIN, AVG, MAX have very efficient incremental algorithms ("embarrassingly incremental"), the overhead should be negligible (but this will need to be profiled to verify my claim). Since I'm not doing CRDT, do you think we should rename this ticket or should I open a different ticket? > Support CRDT aggregation models for counters > - > > Key: TEZ-2161 > URL: https://issues.apache.org/jira/browse/TEZ-2161 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Eric Wohlstadter >Priority: Major > > Some counters such as last event received time need to be handled different > to say bytes read counters. Bytes reads requires a summation across all tasks > within a vertex. The received time requires doing a max() across all the > tasks. First event received time would likely need a min(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters
[ https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506741#comment-14506741 ] Gopal V commented on TEZ-2161: -- The aggregations can be done consistently if and only if any two versions of the counter group can be merged associatively & commutatively. > Support CRDT aggregation models for counters > - > > Key: TEZ-2161 > URL: https://issues.apache.org/jira/browse/TEZ-2161 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > > Some counters such as last event received time need to be handled different > to say bytes read counters. Bytes reads requires a summation across all tasks > within a vertex. The received time requires doing a max() across all the > tasks. First event received time would likely need a min(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)