[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters

2018-03-13 Thread Eric Wohlstadter (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397915#comment-16397915
 ] 

Eric Wohlstadter commented on TEZ-2161:
---

[~gopalv]

What do you think about using java.util.concurrent.atomic.LongAccumulator?

Wouldn't that mean we can get rid of those final Counter aggregation locks?

Although ... we'd need to package this in some kind of -Phadoop3 profile, 
because Tez still advertises support for Java 1.7.

> Support CRDT aggregation models for counters 
> -
>
> Key: TEZ-2161
> URL: https://issues.apache.org/jira/browse/TEZ-2161
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Some counters such as last event received time need to be handled different 
> to say bytes read counters. Bytes reads requires a summation across all tasks 
> within a vertex. The received time requires doing a max() across all the 
> tasks. First event received time would likely need a min().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters

2018-03-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397577#comment-16397577
 ] 

Gopal V commented on TEZ-2161:
--

Yes, move from Counter::increment to Counter::update as polymorphism.

> Support CRDT aggregation models for counters 
> -
>
> Key: TEZ-2161
> URL: https://issues.apache.org/jira/browse/TEZ-2161
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Some counters such as last event received time need to be handled different 
> to say bytes read counters. Bytes reads requires a summation across all tasks 
> within a vertex. The received time requires doing a max() across all the 
> tasks. First event received time would likely need a min().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters

2018-03-13 Thread Eric Wohlstadter (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397573#comment-16397573
 ] 

Eric Wohlstadter commented on TEZ-2161:
---

{quote}Adding a MAX_GC_MILLIS counter with new semantics explicitly is better 
than messing with the existing GC_MILLIS counter.{quote}
Just to make sure I understand, you're suggesting to use something like 
polymorphism to "parameterize" the behavior of Counters. That way the 
particular aggregation logic is encapsulated in specific counter classes, 
instead of scattered throughout all the counter related classes. 

Is that right?

 

> Support CRDT aggregation models for counters 
> -
>
> Key: TEZ-2161
> URL: https://issues.apache.org/jira/browse/TEZ-2161
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Some counters such as last event received time need to be handled different 
> to say bytes read counters. Bytes reads requires a summation across all tasks 
> within a vertex. The received time requires doing a max() across all the 
> tasks. First event received time would likely need a min().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters

2018-03-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397546#comment-16397546
 ] 

Gopal V commented on TEZ-2161:
--

bq. DAG aggregates Vertices which aggregates Tasks which chooses "bestAttempt". 
And the whole thing runs in various locks. This getAllCounters() flow executes 
locally on the AM.

CRDT was more for the P-N counter implementation for aggregates which store 
both the -ve and +ve movements of the counter.

bq. My plan is to add "aggregateAllCounters" to the CounterGroup classes, which 
will be used similarly to "incrAllCounters", except instead of only doing SUM, 
it also does MIN, AVG, MAX.

The Counter needs sub-classes which declare what it needs to aggregate on - 
adding fields to every counter will break everything downstream that exists 
today.

Adding a MAX_GC_MILLIS counter with new semantics explicitly is better than 
messing with the existing GC_MILLIS counter.

> Support CRDT aggregation models for counters 
> -
>
> Key: TEZ-2161
> URL: https://issues.apache.org/jira/browse/TEZ-2161
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Some counters such as last event received time need to be handled different 
> to say bytes read counters. Bytes reads requires a summation across all tasks 
> within a vertex. The received time requires doing a max() across all the 
> tasks. First event received time would likely need a min().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters

2018-03-13 Thread Eric Wohlstadter (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397524#comment-16397524
 ] 

Eric Wohlstadter commented on TEZ-2161:
---

[~gopalv]

I'm not planning to do an actual CRDT implementation for this ticket. I do 
think that would be really valuable (I have used Akka's CRDT in a previous 
project), but it's not needed for the use-case where counters are aggregated in 
the getAllCounters() flow:

i.e. DAG aggregates Vertices which aggregates Tasks which chooses 
"bestAttempt". And the whole thing runs in various locks. This getAllCounters() 
flow executes locally on the AM.

My plan is to add "aggregateAllCounters" to the CounterGroup classes, which 
will be used similarly to "incrAllCounters", except instead of only doing SUM, 
it also does MIN, AVG, MAX. 

Since MIN, AVG, MAX have very efficient incremental algorithms ("embarrassingly 
incremental"), the overhead should be negligible (but this will need to be 
profiled to verify my claim). 

Since I'm not doing CRDT, do you think we should rename this ticket or should I 
open a different ticket?

> Support CRDT aggregation models for counters 
> -
>
> Key: TEZ-2161
> URL: https://issues.apache.org/jira/browse/TEZ-2161
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Some counters such as last event received time need to be handled different 
> to say bytes read counters. Bytes reads requires a summation across all tasks 
> within a vertex. The received time requires doing a max() across all the 
> tasks. First event received time would likely need a min().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-2161) Support CRDT aggregation models for counters

2015-04-22 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506741#comment-14506741
 ] 

Gopal V commented on TEZ-2161:
--

The aggregations can be done consistently if and only if any two versions of 
the counter group can be merged associatively & commutatively.

> Support CRDT aggregation models for counters 
> -
>
> Key: TEZ-2161
> URL: https://issues.apache.org/jira/browse/TEZ-2161
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Some counters such as last event received time need to be handled different 
> to say bytes read counters. Bytes reads requires a summation across all tasks 
> within a vertex. The received time requires doing a max() across all the 
> tasks. First event received time would likely need a min().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)