[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625703#comment-14625703 ] Junping Du commented on YARN-3815: -- Hi [~sjlee0], sorry for replying your comments late. Just busy in delivering a quick poc patch for app level aggregation (system metrics only, not include conflict idea part) in YARN-3816. Will back to your questions when figure that out. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612798#comment-14612798 ] Sangjin Lee commented on YARN-3815: --- {quote} We don't have to make it at container level I think but also not necessary for AM to retain and aggregate these values. AM could help to forward the values to per app timeline collector but don't have to aggregate them. Vinod got more ideas on this in offline discussion. [~vinodkv], can you comment on this? {quote} Interesting. Could you or [~vinodkv] shed light on the idea? It would still need to be captured in an entity or entities, right? I would think sending it as part of the container entities would be simpler and more consistent (in that the per-app collector can simply look at all container metrics as subject to aggregation). I'd love to hear more about this. {quote} I think "per-container averages" is not equal to per-container resource usage. Understanding application's real resource consumption/usage is one of the core use cases for new timeline service at the beginning so I don't think we should rule out anything important here. {quote} How is the per-container resource usage different than the per-container average described in the summary? Could you kindly provide its definition? No doubt understanding applications' real resource consumption/usage is critical. Between the individual container resource usage (which are all captured), the aggregated resource usage at the app/flow level (which the basic real time aggregation addresses), and the running averages/max of the aggregated resource usage at the app/flow level, I think it definitely covers that need. What would be the gap that's not addressed by the above data? > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612687#comment-14612687 ] Junping Du commented on YARN-3815: -- Thanks [~sjlee0] for comments! bq. I think it is pretty natural and straightforward for AMs to aggregate and retain values at the app level, but even if they set it at the container level, it could work. I would rather say it is "natural" before timeline service v2 comes out. :) We don't have to make it at container level I think but also not necessary for AM to retain and aggregate these values. AM could help to forward the values to per app timeline collector but don't have to aggregate them. Vinod got more ideas on this in offline discussion. [~vinodkv], can you comment on this? bq. Note that we're not proposing to keep the average as a time series. So I'm not sure if that is feasible. If not, we may consider to change the proposal to support time series given the data is not too much here. bq. We also ruled out per-container averages (explained in the summary), so per-task resource usage is not an example we're looking for. I think "per-container averages" is not equal to per-container resource usage. Understanding application's real resource consumption/usage is one of the core use cases for new timeline service at the beginning so I don't think we should rule out anything important here. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612684#comment-14612684 ] Sangjin Lee commented on YARN-3815: --- {quote} The use case here should be obviously. A quick real life example here is Google Borg - cluster management tools (http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43438.pdf) which aggregate per-task resource usage information for usage-based charging, debugging job and long-term capacity planning. {quote} Thanks [~djp]. What I'm looking for is a little more specific examples. That's why we spent some time during the discussion to define precisely what we mean by "averages". We discovered that there were already two different definitions of the average for gauges. We also ruled out per-container averages (explained in the summary), so per-task resource usage is not an example we're looking for. So as for the moving (but aggregate) average, are there other examples? What we discussed during the meeting (also in the summary) was the total CPU utilization of an app/flow. Other examples, and how they might be useful, or is that pretty much the best example? > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612680#comment-14612680 ] Sangjin Lee commented on YARN-3815: --- bq. This way sounds very clever. In addition, if we need resource consumption at any standpoint or time window (t1 - t2), we can simply do Avg(t2) * t2 - Avg(t1) * t1. This is much better than aggregating value on each stand point when query. Note that we're not proposing to keep the average as a *time series*. So I'm not sure if that is feasible. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612678#comment-14612678 ] Sangjin Lee commented on YARN-3815: --- {quote} We may consider to provide two ways here: - For legacy applications - like MR, AM already have done aggregation on these counters themselves. - For new application to build against YARN after timeline service v2, AM can delegate YARN timeline service to do aggregation instead of do it themselves. Our data model and aggregation mechanism should assure YARN timeline service can aggregate these framework-specif metrics without get predefined. {quote} I think it's a little more complicated than that. If a new YARN application wants to delegate aggregation to the YARN timeline service, it still needs to do at least the following: - add the framework-specific metrics to the YARN container - do *not* add any of those metrics to the YARN application The framework-specific metrics set on the containers would still be transmitted by the AM (not by the node managers). Then, the YARN timeline service could look at *any* container metrics and apply the uniform aggregation rules. Hopefully YARN apps can add metric values to container entities (there should be a natural mapping from unit of work to containers), otherwise it won't work for them... I think it is pretty natural and straightforward for AMs to aggregate and retain values at the app level, but even if they set it at the container level, it could work. On the other hand, if your app wants to own aggregation, then it should not set the metrics on the containers, or it would be done twice. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612648#comment-14612648 ] Junping Du commented on YARN-3815: -- bq. Also, it would be GREAT if you could give a clear and compelling use case (a real life example) on why such support would be crucial. Thanks! The use case here should be obviously. A quick real life example here is Google Borg - cluster management tools (http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43438.pdf) which aggregate per-task resource usage information for usage-based charging, debugging job and long-term capacity planning. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612629#comment-14612629 ] Sangjin Lee commented on YARN-3815: --- For gauges and their averages and max in particular, [~vinodkv], [~gtCarrera9], [~djp], could you please confirm what I captured in that document is exactly what we want to support? Could you please comment on that? Also, it would be *GREAT* if you could give a clear and compelling use case (a real life example) on why such support would be crucial. Thanks! > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612620#comment-14612620 ] Junping Du commented on YARN-3815: -- bq. app-level aggregation for framework-specific metrics will be done by the AM. I think there is a little misunderstanding on this - just like I mentioned above, AM should/could get relieved from aggregating counters themselves after timeline service v2. Legacy AMs could still push aggregated counters to backend storage though. Others who also sit in the room, any comments here? > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612592#comment-14612592 ] Sangjin Lee commented on YARN-3815: --- Here is my take on what's consensus, what's not, and what's currently out of scope. I may have misread the discussion and your impression/understanding may be different, so please feel free to chime in and comment on this! (consensus or not controversial) - applications table will be split from the main entities table - app-level aggregation for framework-specific metrics will be done by the AM - app-level aggregation for YARN-system container metrics will be done by the per-app timeline collector - real-time aggregation does simple sum for all types of metrics - metrics API will be updated to differentiate gauges and counters (the type information will need to be persisted in the storage) - for gauges, in addition to the simple sum-based aggregation, support average and max - the flow-run table will be created to handle app-to-flow-run ("real-time") aggregation as proposed in the native HBase schema design - auxiliary tables will be implemented as proposed in the native HBase schema design - time-based aggregation (daily, weekly, monthly, etc.) will be done via phoenix tables to enable ad-hoc queries (questions remaining or undecided) - for the average/max support for gauges (see above), confirm that's exactly what we want to support - how to implement app-to-flow-run aggregation for gauges - how to perform the time-based aggregation (mapreduce, using co-processor endpoints, etc.) - how to handle long-running apps for time-based aggregation - considering adopting "null delimiters" (or other phoenix-friendly tools) to support phoenix reading data from the native HBase tables - using flow collectors, user collectors, and queue collectors as means of performing (higher-level) aggregation (out of scope) - support per-container averages for gauges - any aggregation other than time-based aggregation for flows, users, and queues - creating a dependency on the explicit YARN flow API > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612591#comment-14612591 ] Junping Du commented on YARN-3815: -- Thanks [~sjlee0] for nice writeup on the discussions. Looks good for most parts to me. Some comments on app level aggregations: bq. Framework‐specific metrics will be sent to the per‐app collector aggregated by the AM itself. We may consider to provide two ways here: - For legacy applications - like MR, AM already have done aggregation on these counters themselves. - For new application to build against YARN after timeline service v2, AM can delegate YARN timeline service to do aggregation instead of do it themselves. Our data model and aggregation mechanism should assure YARN timeline service can aggregate these framework-specif metrics without get predefined. bq. time average & max: the average multiplied by the elapsed time of the application represents the total resource usage over time. This way sounds very clever. In addition, if we need resource consumption at any standpoint or time window (t1 - t2), we can simply do Avg(t2) * t2 - Avg(t1) * t1. This is much better than aggregating value on each stand point when query. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf, aggregation-design-discussion.pdf, > hbase-schema-proposal-for-aggregation.pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612529#comment-14612529 ] Sangjin Lee commented on YARN-3815: --- Some of us ([~gtCarrera9], [~vinodkv], [~djp], [~zjshen], [~vrushalic], and [~sjlee0]) had a face-to-face design discussion on the aggregation. I am going to post the summary of that discussion along with a proposal for an expanded native HBase schema to support aggregation. I believe we are much closer to a consensus on the aggregation design, but some important questions still remain. For the sake of public discussion and inviting more participants and comments, we should follow up here on this JIRA. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600411#comment-14600411 ] Li Lu commented on YARN-3815: - Sorry I mistakenly assigned this JIRA to myself. I've assigned back. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598781#comment-14598781 ] Ted Yu commented on YARN-3815: -- [~jrottinghuis]: Your description makes sense. Cell tag is supported since hbase 0.98+ so we can use it to mark completion. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598759#comment-14598759 ] Joep Rottinghuis commented on YARN-3815: Thanks [~ted_yu] for that link. I did find that code and I'm reading through it. Yes it uses a coprocessor on the reading side to "collapse" values together and permanently "collapse" them together on compaction. I want to use a similar approach here. We cannot use the delta write directly as-is for the following reasons: - For running applications, if we wanted to write only the increment the AM (or ATS writer) will have to keep track of the previous values in order to write the increment only. When the AM crashes and/or the ATS writer restarts we won't know what previous value we had written (and what has already been aggregated. So, we'd have to write the increment plus the latest value. - Ergo, why don't we just write the latest value to begin with and leave off the increment. Now we cannot "collapse" the deltas / latest value until the application is done. Otherwise we would again loose track of what was previously aggregated. So the new approach would be to write the latest value for an app and indicate (using a cell tag) that the app is done and that it can be a collapsed. We would use the co-processor only on the read-side just like with the delta write and that co-processor would aggregate values on the fly for reads and collapse during writes. Those writes would be limited to one single row, so we wouldn't have any weird cross-region locking issues, nor delays and hickups in the write throughput. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598590#comment-14598590 ] Sangjin Lee commented on YARN-3815: --- Moving from offline discussions... Now aggregation of *time series metrics* is rather tricky, and needs to be defined. Would an aggregated metric (e.g. at the flow level) of time series metrics (e.g. at the app level) be a time series itself? I see several problems with defining that as a time series. Individual app time series may be sampled at different times, and it's not clear what time series the aggregated flow metric would be. I think it might be simpler to say that an aggregated flow metric of time series may not need to be a time series itself. On the one hand, there is a general issue of at what time the aggregated values belong, regardless of whether they are time series or not. If all leaf values are recorded at the same time, it would be unambiguous. The aggregated metric value is of the same time. However, it is rarely the case. I think the current implicit behavior in hadoop is simply to take the latest values and add them up. One example is the MR counters (task level and job level). The task level counters are obtained at different times. Still, the corresponding job counters are simply sums of all the latest task counters, although they may have been taken at different times. We're basically taking that as an approximation that's "good enough". In the end, the final numbers will become accurate. In other words, the final values would truly be the accurate aggregate values. The time series basically adds another wrinkle to this. In case of a simple value, the final values are going to be correct, so this problem is less of an issue, but time series will retain intermediate values. Furthermore, their publishing interval may have no relationship with the publishing interval of the leaf values. I think the baseline approach should be either (1) do not use time series for the aggregated metrics, or (2) just to the best effort approximation by adding up the latest leaf values and store it with its own timestamp. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598577#comment-14598577 ] Sangjin Lee commented on YARN-3815: --- {quote} About flow online aggregation, I am not quite sure on requirement yet. Do we really want real time for flow aggregated data or some fine-grained time interval (like 15 secs) should be good enough - if we want to show some nice metrics chart for flow, this should be fine. {quote} Yes, I agree with that. When I said "real time", it doesn't mean real time in the sense that every metric is accurate to the second. Most likely raw data themselves (e.g. container data) are written on an interval anyway. Some type of time interval for aggregation is implied. {quote} Any special reason not to handle it in the same way above - as HBase coprocessor? It just sound like gross-grained time interval. Isn't it? {quote} I do see your point in that what I called the "real time" aggregation can be considered the same type of aggregation as the "offline" aggregation only on a shorter time interval. However, we also need to think about the use cases of such aggregated data. The former type of aggregation is very much something that can be plugged into UI such as the RM UI or ambari to show more immediate data. These data may change as the user refreshes the UI. So this is closer to the raw data. On the other hand, the latter type of aggregation lends itself to more analytical and ad-hoc analysis of data. These can be used for calculating chargebacks, usage trending, reporting, etc. Perhaps it could even contain more detailed info than the "real time" aggregated data for the reporting/data mining purposes. And that's where we would like to consider using phoenix to enable arbitrary ad-hoc SQL queries. One analogy [~jrottinghuis] brings up regarding this is OLTP v. OLAP. That's why we also think it makes sense to do only "offline" (time-based) aggregation for users and queues. At least in our case with hRaven, there hasn't been a compelling reason to show user- or queue-aggregated data in semi-real time. It has been perfectly adequate to show time-based aggregation, as data like this tend to be used more for reporting and analysis. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598556#comment-14598556 ] Sangjin Lee commented on YARN-3815: --- {quote} AM currently leverage YARN's AppTimelineCollector to forward entities to backend storage, so making AM talk directly to backend storage is not considered to be safe. {quote} Just to be clear, I'm *not* proposing AMs writing directly to the backend storage. AMs continue to write through the app-level timeline collector. My proposal is that the AMs are responsible for setting the aggregated framework-specific metric values on the *YARN application entities*. Let's consider the example of MR. MR itself would have its own entities such as job, tasks, and task attempts. These are distinct entities from the YARN entities such as application, app attempts, and containers. We can either (1) have the MR AM set framework-specific metric values at the YARN container entities and have YARN aggregate them to applications, or (2) have the MR AM set the aggregated values on the applications for itself. I feel the latter approach is conceptually cleaner. The framework is ultimately responsible for its metrics (YARN doesn't even know what metrics there are). We could decide that YARN would look at the framework-specific metrics at the app level and aggregate them from the app level onward to flows, user, and queue. In addition, most frameworks already have an aggregated view of the metrics. It would be very straightforward to emit them at the app level. In summary, option (1) asks the framework to write metrics on its own entities (job, tasks, task attempts) plus YARN container entities. Option (2) asks the framework to write metrics on its own entities (job, tasks, task attempts) plus YARN app entities. IMO, the latter is a more reliable approach. We can discuss this further... > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596616#comment-14596616 ] Ted Yu commented on YARN-3815: -- bq. in the spirit of readless increments as used in Tephra Readless increment feature is implemented in cdap, called delta write. Please take a look at: cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementHandler.java cdap-hbase-compat-0.98//src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementSummingScanner.java The implementation uses hbase coprocessor, BTW > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596173#comment-14596173 ] Ted Yu commented on YARN-3815: -- My comment is related to usage of hbase. bq. under framework_specific_metrics column family Since column family name appears in every KeyValue, it would be better to use very short column family name. e.g. f_m for framework metrics. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596129#comment-14596129 ] Junping Du commented on YARN-3815: -- Thanks [~sjlee0] and [~jrottinghuis] for review and good comments in detail. [~jrottinghuis]'s comments are pretty long and I could only reply part of it and will finish the left parts tomorrow. :) bq. For framework-specific metrics, I would say this falls on the individual frameworks. The framework AM usually already aggregates them in memory (consider MR job counters for example). So for them it is straightforward to write them out directly onto the YARN app entities. Furthermore, it is problematic to add them to the sub-app YARN entities and ask YARN to aggregate them to the application. Framework’s sub-app entities may not even align with YARN’s sub-app entities. For example, in case of MR, there is a reasonable one-to-one mapping between a mapper/reducer task attempt and a container, but for other applications that may not be true. Forcing all frameworks to hang values at containers may not be practical. I think it’s far easier for frameworks to write aggregated values to the YARN app entities. AM currently leverage YARN's AppTimelineCollector to forward entities to backend storage, so making AM talk directly to backend storage is not considered to be safe. It is also not necessary too because the real difficulty here is to aggregate framework specific metrics in other levels (flow, user and queue), because that beyond the life cycle of framework so YARN have to take care of it. Instead of asking frameworks to handle specific metrics themselves, I would like to propose to treat these metrics as "anonymous", it would pass both metrics name and value to YARN's collector and YARN's collector could aggregate it and store as dynamic column (under framework_specific_metrics column family) into app states table. So other (flow, user, etc.) level aggregation on freamework metrics could happen based on this. bq. app-to-flow online aggregation. This is more or less live aggregated metrics at the flow level. This will still be based on the native HBase schema. About flow online aggregation, I am not quite sure on requirement yet. Do we really want real time for flow aggregated data or some fine-grained time interval (like 15 secs) should be good enough - if we want to show some nice metrics chart for flow, this should be fine. Even for real time, we don't have to aggregate everything from raw entity table, we don't have to duplicated count metrics again for finished apps. Isn't it? bq. (3) time-based flow aggregation: This is different than the online aggregation in the sense that it is aggregated along the time boundary (e.g. “daily”, “weekly”, etc.). This can be based on the Phoenix schema. This can be populated in an offline fashion (e.g. running a mapreduce job). Any special reason not to handle it in the same way above - as HBase coprocessor? It just sound like gross-grained time interval. Isn't it? bq. This is another “offline” aggregation type. Also, I believe we’re talking about only time-based aggregation. In other words, we would aggregate values for users only with a well-defined time window. There won’t be a “real-time” aggregation of values, similar to the flow aggregation. I would also call for a fine-grained time interval (closed to real-time) because the aggregated resource metrics on user could be used in billing hadoop usage in a shared environment (no matter private or public cloud), so user need to know more details on resource consumption especially in some random peak time. bq. Very much agree with separation into 2 categories "online" versus "periodic". I think this will be natural split between the native HBase tables for the former and the Phoenix approach for the latter to each emphasize their relative strengths. I would question the necessary for "online" again if this mean "real time" instead of fine-grained time interval. Actually, as a building block, every container metrics (cpu, memory, etc.) are generated in a time interval instead of real time. As a result, we never know the exactly snapshot of whole system in a precisely time but only can try to getting closer. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the qu
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593890#comment-14593890 ] Joep Rottinghuis commented on YARN-3815: For flow-level aggregates I'll separately write up ideas about how to do that. In short we need to focus on write performance, plus the fact that we have to deal with the need to aggregate increments to aggregates from running applications. This makes it tricky to do correctly, specifically when apps (and ATS writers) can crash and need to restart. We'll have to keep track of the last values written. Initially I thought that using a coprocessor to do this server side solves the problem. The challenge is that it will be invoked in the write-path of individual stats, so slow writes to a second region server (hosting the agg table/row) can have a rippling affect on many writes. Even worse, we can end up with a deadlock situation under load conditions when the agg table/row happens to be hosted on the same region server and the current write is blocked on the completion of coprocessor which needs to write but is blocked on a full queue on its own region server. It think the solution will be to do something in the spirit of readless increments as used in Tephra. Similarly we'd collapse values only when flushes or compactions happen, and then aggregation is restricted to a single row which is locked without issues. On reads we collapse the pre-aggregated values plus the values from currently running jobs. The significant difference will be that we can compact only when jobs are complete. I'll try to write up a more detailed design for this. If we follow [~sjlee0]'s suggestion to make all the other aggregates periodic, then we can use mapreduce for those. The big advantage is that we can then use control records like we do in hRaven to efficiently keep track of what we have already aggregated. The tricky ones will be the long running ones we have to keep getting back to. Ideally we should be able to read the raw values once and then "spray" they out to the various aggregate tables (cluster, queue, user) per time period. Otherwise we end up scanning over the raw values over and over again. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593867#comment-14593867 ] Joep Rottinghuis commented on YARN-3815: Very much agree with separation into 2 categories "online" versus "periodic". I think this will be natural split between the native HBase tables for the former and the Phoenix approach for the latter to each emphasize their relative strengths. A few thoughts around time-based aggregations: - If the aggregation time is smaller than the runtime of apps/flows we need to consider what that means for an aggregate. As an extreme example consider hourly aggregates for applications that take hours to complete. What do we actually count in that one hour? Do we only attribute to that hour the specific total metric that came in at that time, or do we try to apportion part of the increment to what happened only in that one hour? Ditto goes for daily aggregates when we have long running jobs. In hRaven we simply don't deal with this at all by making the simplifying assumption that all metrics and usage all happen in the instant that the job is completed. With ATSv2 being (near) real-time that will simply not work, so we need to consider what that means. Are we requiring apps to write at least once within each aggregation period? - If we store aggregates in columns (hourly columns, daily columns) we need to limit the growth of # columns by making the next level aggregate part of the rowkey. This would limit 24 hourly columns to a single day row. Similarly we'd have 7 dailies in a week, or perhaps just up to 31 dailies in a month. All of these considerations come from a strong need to be able to limit the range over which we scan in order to get a reasonable performance in the face of lots of data. {quote} Flow level: ○ expect return: aggregated stats for a flow_run, flow_version and flow {quote} I think "flow" level aggregations should really only mean flow-run level aggregation in the sense of the separation that [~sjlee0] mentioned above for HBase native online aggregations. I'm not sure that flow_version rollups even make sense. Flow_version are important to be able to pass in as a filter: give me stats for this flow only matching this version. This is useful for cases such as reducer estimation where a job can make effective use only of previous run data if the version of the flow hasn't changed. The fact that there were three version of a Hive query is good to now. Knowing when each version first appeared is good to know. Knowing the total cost for version 2 is probably less useful. Flow level aggregates are useful only with a particular timerange in mind. What was the cost for the DailyActiveUsers job (no matter the version) for the last week? How many bytes did the SearchJob read from HDFS in the last month? Thoughts around queue level aggregation (in addition to Sangjin's comments that these should be time-based): Queue level aggregates have additional complexities. First queues can come and go very quickly and apps can be moved from queue to queue. For the purpose of normal shorter lived applications it might be tempting to use the final queue that a job ran in (this is the assumption we make in hRaven). With long running apps this assumption breaks down. Now if an app runs for an hour and accumulates some value X for a metric Y it will be recorded as such in the original queue agg. Now the application gets moved and the new value of metric Y is now Z. Are we going to aggregate Z-X in the new queue, or simply all of Z? The sums of all metrics Z in the queues will not be the same as the sums of all apps or flows. In addition, queues can grow and shrink on the fly. Are we going to record that? In the very least we need to prefix the cluster in the rowkey so that we can differentiate different queues from different clusters. And then there are hierarchical queues. Are we thinking of rolling stats to each level, or just in the individual leaf queue? Will we structure the rowkeys that we can do prefix scans for queues called /cluster/parent/childa /cluster/parent/childb ? > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level,
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593649#comment-14593649 ] Sangjin Lee commented on YARN-3815: --- Thanks [~djp] for putting this together. I added comments in the offline doc, but I'll move the main one (high level comments) over here. (0) on “aggregation” Like you mentioned, I think it is helpful to make distinction on different types of aggregation we’re talking about here. These are somewhat separate functionalities. My sense of the types of aggregation is similar to yours, but not exactly the same. It would be good if we can converge on their definitions. I see 4 types of aggregation: - app-level aggregation - app-to-flow aggregation (“online” or “real time”) - time-based flow aggregation (“batch” or “periodic”) - user/queue aggregation I’ll explain my definitions in more detail below. (1) app-level aggregation This is aggregating metrics from sub-app entities (e.g. containers) to the YARN application. This can include both framework-specific metrics (e.g. HDFS bytes written for mapreduce) and YARN-system metrics (e.g. container CPU %). It would be ideal for app entities to have values for these metrics aggregated from sub-app entities. How we do that is going to be different between framework-specific metrics and YARN-system metrics. For framework-specific metrics, I would say this falls on the individual frameworks. The framework AM usually already aggregates them in memory (consider MR job counters for example). So for them it is straightforward to write them out directly onto the YARN app entities. Furthermore, it is problematic to add them to the sub-app YARN entities and ask YARN to aggregate them to the application. Framework’s sub-app entities may not even align with YARN’s sub-app entities. For example, in case of MR, there is a reasonable one-to-one mapping between a mapper/reducer task attempt and a container, but for other applications that may not be true. Forcing all frameworks to hang values at containers may not be practical. I think it’s far easier for frameworks to write aggregated values to the YARN app entities. For YARN-system metrics, this would need to be done by YARN. I think we can have the timeline collector aggregate the values in memory and write them out periodically. The details need to be worked out, but that is definitely one way to go. The only tricky thing is then the container metrics should flow through the per-app timeline collector, and cannot come from the RM timeline collector (Junping pointed that out already). (2) app-to-flow online aggregation This is more or less live aggregated metrics at the flow level. This will still be based on the native HBase schema. Actually doing the above for the app-level integration makes app-to-flow online aggregation simpler. It now only has to look at app entities to collect the data. Initially we were thinking of leveraging a HBase co-processor, but there are some technical challenges with that. We had a discussion on possible ways of doing this, and [~jrottinghuis] has a proposal for this. I’ll let Joep chime in on this. (3) time-based flow aggregation This is different than the online aggregation in the sense that it is aggregated along the time boundary (e.g. “daily”, “weekly”, etc.). This can be based on the Phoenix schema. This can be populated in an offline fashion (e.g. running a mapreduce job). (4) user/queue aggregation This is another “offline” aggregation type. Also, I believe we’re talking about only time-based aggregation. In other words, we would aggregate values for users only with a well-defined time window. There won’t be a “real-time” aggregation of values, similar to the flow aggregation. > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589880#comment-14589880 ] Junping Du commented on YARN-3815: -- Attach proposal for the first version. Comments are welcome! > [Aggregation] Application/Flow/User/Queue Level Aggregations > > > Key: YARN-3815 > URL: https://issues.apache.org/jira/browse/YARN-3815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: Timeline Service Nextgen Flow, User, Queue Level > Aggregations (v1).pdf > > > Per previous discussions in some design documents for YARN-2928, the basic > scenario is the query for stats can happen on: > - Application level, expect return: an application with aggregated stats > - Flow level, expect return: aggregated stats for a flow_run, flow_version > and flow > - User level, expect return: aggregated stats for applications submitted by > user > - Queue level, expect return: aggregated stats for applications within the > Queue > Application states is the basic building block for all other level > aggregations. We can provide Flow/User/Queue level aggregated statistics info > based on application states (a dedicated table for application states is > needed which is missing from previous design documents like HBase/Phoenix > schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)