[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742702#comment-14742702
 ] 

Naganarasimha G R commented on YARN-3816:
-----------------------------------------

Hi [~djp], Sorry for the delay and pitching in late. 
Few doubts :
# I am not sure whether all the things mentioned in the jira description is 
achieved by the patch. (would it be good to update description with what all 
will be completed as part of the patch?). Following are not completely achieved 
right?
#* number of containers launched/completed/failed,
#* framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to 
show details of states in framework level.
# In the doc, ApplicationState Table (aggregated from 
AppLevelTimelineCollector​) has Container Aggregate metrics (allocated: 0 
preempted:0 failed: 0 reuse: 0 ) is this req @ AppLevelTimelineCollector​ felt 
it should be only @ aggregated from  ​RMTimelineCollector. Also time(start: 
last_modification: avg_execution ) is required as metric ? may be i misread the 
table description ?
# In the doc {{aggregation-design-discussion.pdf}}, you had mentioned that 
{{time average & max}} is what will be considered, but in the patch it seems 
more like only {{SUM}} is supported neither avg or max, so is {{sum}} more imp 
than the other(or am i missing something) ? Also would like to know the 
significance of this measurement as i felt {{per‐container average}} more 
helpful as it can be useful for calibrating RM.
# IIUC Based on the current design aggregation seems to be happening @ the 
collector end. in that case do we require 
{{TimelineWriter.aggregate(TimelineEntity data, TimelineAggregationTrack 
track)}} ? Is there any idea to push some logic to writer for aggregation?
# {{TimelineAggregationBasis}} doesnt have value for {{queue}}, as this is used 
in {{TimelineReaderWebServices}}, inst it required for reader?
# will it be required to accumulate time series data with single value data and 
viceversa ? would accumulation need to be done on same type ? if not some real 
scenarios where it can be possibly happen.
# Would it be better to have set of {{operation}} which can be performed in 
TimelineMetric so that accumulateTo automatically detect and accumulate for 
diff operations ? currently it seems like statically set to {{SUM}} in 
{{TimelineCollecor}}.
# Currently for each putEntity call in collector we are not only aggregating & 
invoking accumulateTo but also sending it to be written to the writer, but in 
the doc its mentioned that it will cache for 15 seconds and then update right?
# Not sure earlier why was {{pid}} added for a container cpu and mem usage 
metric and not sure why we are removing it. But seems like for a given 
container we do not req pid to be appended as it will be unique to it. is that 
the reason we are removing it ?
# do we need to set {{aggregateTo}} to true for container metrics(cputotalCore% 
& pmemUsage) to ? also we are currently not capturing {{vmemUsage}} do we need 
to capture it ?
# In the Doc its mentioned we are going to split the table "ApplicationState 
table" into 2 ??It can be split into two tables by aggregated from 
RMTimelineCollector or AppLevelTimelineCollector?? , is it req?

some nits :
# yarn.timeline-service.aggregation.accumulation.enabled can have default value 
to be explicitly set as true in yarn-default.xml as per the default value in 
yarn config.
# in {{TestTimelineMetric.testAccumulationOnTimelineMetrics}} assertEquals 
expected value should come as first arg and the actual expression as next. when 
it fails exception msg will come wrong. also unused import in that class
# 2 static methods of TimelineCollector.aggregateMetrics(TimelineEntities) are 
public are they planned to be used some other class ? if not we can make it 
private. Also aggregateMetrics returns a map, can it be a List/Set which would 
suffice for {{appendAggregatedMetricsToEntities}}
# EntityColumnPrefix.AGGREGATED_METRICS is not used anywhere, is it req?

Trying to create a setup and test the patch in the cluster, if i come across 
more queries will inform.

> [Aggregation] App-level Aggregation for YARN system metrics
> -----------------------------------------------------------
>
>                 Key: YARN-3816
>                 URL: https://issues.apache.org/jira/browse/YARN-3816
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to