[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237024#comment-15237024 ]
Varun Saxena commented on YARN-3816: ------------------------------------ Had a quick scan of the patch. There seems to be multiple aggregation operations. If we are appending it to a column qualifier and with 4 aggregation operations, we would need to create 4 single column value filters for a single metric i.e. if metric filter says metric1 > 40, we will have to create filter list like metric1=SUM > 40 OR metric1=AVG > 40 OR metric1=NOOP > 40 and so on. Will these aggregation operations be required by Offline aggregation(YARN-3817) ? If yes, can there be some other mechanism to indicate these aggregation operations instead of appending it in the column qualifier ? Configuring it in some way, was a suggestion given earlier. cc [~sjlee0] > [Aggregation] App-level aggregation and accumulation for YARN system metrics > ---------------------------------------------------------------------------- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Junping Du > Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)