[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701291#comment-14701291 ] Xianyin Xin commented on YARN-3652: --- A simple introduction of the preview patch: SchedulerMetrics is focus on metrics that related to the scheduler's performace. The following metrics are considered: num of waiting events in the scheduler dispatch queue; num of all kinds events in the scheduler dispatch queue; events handling rate; node update handling rate; events adding rate; node update adding rate; statistical info of num of waiting events; statistical info of num of waiting node update events; containers allocation rate; scheduling method exec rate, i.e., num of scheduling tries per second; app allocation call duration; nodeUpdate call duration; scheduling call duration; These metrics give rich information of the scheduler performance, which can be used to diagnose the anomaly of the scheduler. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin Attachments: YARN-3652-preview.patch As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701267#comment-14701267 ] Xianyin Xin commented on YARN-3652: --- In the patch i used functions from HADOOP-12338. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin Attachments: YARN-3652-preview.patch As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701296#comment-14701296 ] Xianyin Xin commented on YARN-3652: --- Hi [~sunilg], [~vvasudev], would you please have a look? Any comments are welcome. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin Attachments: YARN-3652-preview.patch As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561210#comment-14561210 ] Varun Vasudev commented on YARN-3652: - My apologies for the delay [~xinxianyin]. We do need a SchedulerMetrics class. The general idea is that SchedulerHealth should pick up values from the SchedulerMetrics class but that the SchedulerMetrics class should ideally provide more information. As an example, the SchedulerHealth cares about the number of reserved containers, which the SchedulerMetrics class should provide. Ideally, though, the SchedulerMetrics class would also give me some extra information such as the mean, the distribution and the variance of the the number of reserved containers. I think purely for the purposes of YARN-3630, you should use modify the SchedulerHealth class to expose the number of waiting events, but we can independently work on a SchedulerMetrics class as well. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562221#comment-14562221 ] Xianyin Xin commented on YARN-3652: --- Thanks for your comments, [~vvasudev]. I agree with you on the general idea of the relation between ScheduerMetrics and SchedulerHealth. However, SchedulerHealth now has only be implemented on CS, if we turn to use SchedulerHealth in YARN-3630, we need to wait SchedulerHealth support in Fair, and expose {{getSchedulerHealth}} in YarnScheduler. Do we have any plan on that? A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560252#comment-14560252 ] Xianyin Xin commented on YARN-3652: --- Thanks [~vinodkv]. When i said YARN-3293 and {{SchedulerMetrics}} are similar, i mean the two are similar on function design, and it is not implemented yet at that time. A simple {{SchedulerMetrics}} was introduced in YARN-3630, where a {{#ofWaitingSchedulerEvent}} metric was used to evaluate the load of the scheduler. [~vvasudev], hope for your idea. :) A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559347#comment-14559347 ] Vinod Kumar Vavilapalli commented on YARN-3652: --- I haven't looked at the original SchedulerMetrics patches, but pointed them out as they seemed relevant. [~vvasudev], can you please comment on this? A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549626#comment-14549626 ] Xianyin Xin commented on YARN-3652: --- Or just use {{SchedulerMetrics}} at the perspective of monitoring the scheduler's performance? A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549600#comment-14549600 ] Xianyin Xin commented on YARN-3652: --- Hi [~vinodkv], YARN-3293 and {{SchedulerMetrics}} are similar except that they focus on different points. Should we just leverage the existed {{SchedulerHealth}} and add the performance metrics into it? A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547435#comment-14547435 ] Xianyin Xin commented on YARN-3652: --- Thanks [~vinodkv], that's very helpful. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545260#comment-14545260 ] Sunil G commented on YARN-3652: --- Hi [~xinxianyin] This will be a very helpful feature and thanks for working on same. Few points: 1. *Throughput*: Are you mentioning about #events processed over a period of time? If so, how can we set the timeline by which throughput is calculated (configurable?)? A clear indicator from this will be like we can predict possible end timeline for the pending events in dispatcher queue. Adding throughput with #no of pending events may give much more better indication about RM overload. 2. However there are many events coming to scheduler, if possible a filter for the events based on events type may be helpful to give an accuracy for throughout and scheduling delay. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545607#comment-14545607 ] Xianyin Xin commented on YARN-3652: --- Thanks for comments, [~sunilg]. {quote} 1. *Throughput* : Are you mentioning about #events processed over a period of time? If so, how can we set the timeline by which throughput is calculated (configurable?)? A clear indicator from this will be like we can predict possible end timeline for the pending events in dispatcher queue. Adding throughput with #no of pending events may give much more better indication about RM overload. {quote} In fact the first comes in my mind is the #containers allocated by scheduler per second, because the containers allocation what users care and the node update event is the most important scheduler event. The rate of processing events is also a nice indicator, just as you comment. {quote} 2. However there are many events coming to scheduler, if possible a filter for the events based on events type may be helpful to give an accuracy for throughout and scheduling delay. {quote} +1 for the idea. Besides, the #events processed by scheduler per second is large, so the indexes based on this is volatile. We may consider some method to smooth the fluctuate, like making sampling or statistics. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545769#comment-14545769 ] Vinod Kumar Vavilapalli commented on YARN-3652: --- YARN-3293 added some key metrics for Capacity scheduler. You may want to do one pass of that patch. /cc [~vvasudev] A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)