Li Lu commented on YARN-3816:

Hi [~djp], sorry about the delay. I've gone through the patch and here are some 
quick feedback/questions. I may come up with more comments tomorrow daytime, 
but here are some quick ones so that I'm not blocking you to refresh the patch:

- TimelineAggregationBasis.java
Shall we differentiate realtime and offline aggregations? IIUC, {{APPLICATION}} 
type represents realtime aggregation while the other two represents offline 

- TimelineMetric.java
  -- setToAggregate(), why do we need a final here? 
  -- I'm a little bit confused by the following line's comments:
  +        // set delta to 0 instead of null.
  +        Number delta = TimelineMetricCalculator.getZero(n2);
  +        // t1 = null means this is the first value get aggregated, delta 
  +        // be null.
  So in conclusion, if t1 is null, we set delta to 0 since it's the first value 
to be aggregated. Or else, we aggregate the delta? 
  -- I'm fine to keep the else part of the 
latestMetric.getType().equals(Type.SINGLE_VALUE) for now. Maybe we'd like to 
update this part when implementing TODO
  -- l.183, I noticed we're copying the whole list to rebuild the valueList, 
then only used one element there? Are we sure the values list is small enough 
all the time? 
  -- TimelineMetric and TimelineCollector have two definitions of "_AREA". Is 
it possible to only have one of them? 

- TimelineCollector.java
  -- So we put intermediate aggregation status into the collector as some hash 
maps. Will there be challenges to rebuild these status? We may need to face the 
situations like nm restart (we're an aux service inside nm). 
  -- We need a method similar to appendAggregatedMetricsToEntities in 
YARN-3817. Actually I believe this is a very helpful method in normal 
aggregation workflow. It would be great if we can find a more visible place to 
put it. 
  -- I have one question about {{doAccumulation}}: in offline aggregation we're 
using a two-staged aggregation: we perform a flow run level aggregation in the 
mapper, and then aggregate entities in the same flow again in the reducer for 
entities from different mappers. In this case, we need to set the 
{{doAccumulation}} parameter into false for the aggregation in the reducer, 

I think we can fine tune the logic inside the aggregations (SUM/REP/etc,.) 
later. Right now the basic structure of this in memory solution LGTM. 

> [Aggregation] App-level Aggregation for YARN system metrics
> -----------------------------------------------------------
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).

This message was sent by Atlassian JIRA

Reply via email to