[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562211#comment-14562211 ] Xianyin Xin commented on YARN-3630: --- Thanks for your comments, [~vvasudev]! {quote} your patch doesn't check if the calculated interval is greater than the ping interval to determine liveliness for the AM and the NM. Is that by design? {quote} It's true we should do that. But in this patch I haven't add the mechanism of determining a up limit for the {{nextHeartbeatInterval}}. I think the limit should much less than the ping interval which is 10 minutes by default. Other hand, do you think a hard configurable limit is accepted? {quote} With respect to adaptive heartbeats for the NMs - my concern is that the proposed solution will lead to behaviour where the NMs will be told to back off - the NMs will wait for sometime - the RM will receive a flood of NM updates - leading to the NMs being told to back off and so on and so forth. We'll end up in a situation where the pings will become clustered around particular time intervals, leading to container allocation and release delays. You might be better off picking a random interval between the default interval and the calculated interval to spread out the NM pings {quote} Thanks for reminding, it's a situation I didn't think much. I think your suggestion is a nice choice. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor Attachments: Notes_for_adaptive_heartbeat_policy.pdf, YARN-3630.001.patch.patch, YARN-3630.002.patch It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545188#comment-14545188 ] Xianyin Xin commented on YARN-3630: --- Thanks, [~leftnoteasy] and [~kasha]. [~leftnoteasy], I think {{ScheulerMetrics}} is a good idea, but I think we should also consider some things: # {{ScheulerMetrics}} tracks various indexes of a scheduler, but we also have {{QueueMetrics}}. Not exactly, the {{root.metrics}} gives us most of the information of the scheduler, then how to deal with the relation between the two; # #events waiting for being handled is an important index for evaluating the scheduler's load, but it is not owned by scheduler, it is maintained by {{ResourceManager#SchedulerEventDispatcher}}, then who will maintain the {{ScheulerMetrics}}, the {{SchedulerEventDispatcher}} or the scheduler itself? From the literal meaning, {{ScheulerMetrics}} should be maintained by scheduler. Anyway, considering the WebUI improvement you mentioned, a {{ScheulerMetrics}} is need. Created another jira YARN-3652 to discuss this. Thanks [~kasha] for valuable suggestions on the policy of determining the heartbeat interval, in the following days I'll work for a draft. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545218#comment-14545218 ] Sunil G commented on YARN-3630: --- bq.Are we considering automatically slowing down the NM heartbeats as well? YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545219#comment-14545219 ] Sunil G commented on YARN-3630: --- bq.Are we considering automatically slowing down the NM heartbeats as well? YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545225#comment-14545225 ] Sunil G commented on YARN-3630: --- bq.Are we considering automatically slowing down the NM heartbeats as well? +1. It will be wise to slowdown the heartbeats from NM which shares a less load. However there should be a limit or range to which it can be slowed down even in lighter load. Else i feel more starvation can happen for applications. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545807#comment-14545807 ] Vinod Kumar Vavilapalli commented on YARN-3630: --- NodeManagers already get a target heartbeat interval via responses. We haven't seen a real-life need yet for adaptive tuning, so we haven't done that. The same should be done here, for now, we should just send the target interval across and tune it later. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544157#comment-14544157 ] Wangda Tan commented on YARN-3630: -- [~xinxianyin], Thanks for sharing your thoughts, I think we can add an interface to Dispatcher to get #events waiting. We can add more details of waiting events for different components, but as the first step, I think we can add it for scheduler. Maybe we can create a SchedulerMetrics to track scheduler internal fields, sounds like a plan? [~kasha]. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544244#comment-14544244 ] Karthik Kambatla commented on YARN-3630: Number of events in the dispatcher is definitely a good indicator of how busy the scheduler is. We could use it as one of the factors. Other factors: # Some apps are more tolerant of longer intervals than others and we should have a provision to specify an upper-limit on the interval. # Weight/capacity of the queue based on whether it is FairScheduler/ CapacityScheduler # How starved an application is. In FairScheduler, that translates to min(fairshare - current-allocation, pending-resources). Other things we need to consider: # How do we plan to enforce the scheduler doesn't hear from applications sooner than the specified interval? The AM can always choose to ignore, right? # Are we considering automatically slowing down the NM heartbeats as well? With continuous/asynchronous scheduling enabled, I suppose slowing down NM heartbeats could be better than slowing down AM heartbeats. We should be careful here though - we need to take into account how used a node is. If a node is more allocated, slowing heartbeats could lead to delay in noticing completed containers. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542747#comment-14542747 ] Wangda Tan commented on YARN-3630: -- +1 for the general idea, [~xinxianyin], I think one very good point which you mentioned in https://issues.apache.org/jira/browse/YARN-3630?focusedCommentId=14539662page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14539662, showing the events waiting in the scheduler event handler queue in web UI is more important to figure out if scheduler being overloaded. Which could be addressed in separated JIRA. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542692#comment-14542692 ] Karthik Kambatla commented on YARN-3630: Sounds like a reasonable idea. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543071#comment-14543071 ] Xianyin Xin commented on YARN-3630: --- Thanks [~kasha] and [~leftnoteasy]. Indeed the waiting events in the scheduler event handler queue is an important index to show the scheduler's load. Now the queue is a private variable of the inner class {{SchedulerEventDispatcher}} and it can't be visited from outer space, should we add some methods in {{ClusterMetrics}} or other monitor classes to get the queue information? If we show the events in web UI, maybe we should also consider which type information could be shown, e.g., #events? #events of each app? #events of node update? or something like this. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539574#comment-14539574 ] Steve Loughran commented on YARN-3630: -- Making it something that can be set in yarn-site for apps to pick up would be the simple way. Otherwise it would introduce extra fields in protobuf messages on AM registration, AMs to handle the absence of the field on older versions, tests for all of this, etc etc. A documented YARN property is something that management tools can trivially add, apps can simply read YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539631#comment-14539631 ] Zoltán Zvara commented on YARN-3630: I was thinking in a more adaptive solution that takes current load into consideration. Anyway, having a global configuration parameter for desired heartbeat would simplify thinks for application. Any thoughts on implementing something like this? YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539666#comment-14539666 ] Xianyin Xin commented on YARN-3630: --- Hi [~Ehnalis], would you mind if i assign this jira to me? YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539671#comment-14539671 ] Zoltán Zvara commented on YARN-3630: [~xinxianyin], I would not mind. But I think this issue should have a higher priority. For a highly contested and multi-tenant cluster this should be a problem. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539689#comment-14539689 ] Steve Loughran commented on YARN-3630: -- OK, adaptive does make sense: the RM can instruct AMs to back off as it experiences load. Telling them to speed up again will take more time (since they are backed off), but that probably matters less YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539693#comment-14539693 ] Xianyin Xin commented on YARN-3630: --- Agree, [~Ehnalis]. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539725#comment-14539725 ] Xianyin Xin commented on YARN-3630: --- It's true [~ste...@apache.org]. But I think comparing to the pain of lacking this feature for a large-scale cluster, we can accept the delay. Do you agree? YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539662#comment-14539662 ] Xianyin Xin commented on YARN-3630: --- I think an adaptive heartbeat interval is need, especially the cluster's scale is large, in which case the heartbeat will overload RM. In fact I was also considering this problem. We need a policy for RM to determine the interval, and set the interval in the AllocateResponse. For the policy, I think the queue length of scheduler events could be a nice index of RM's load, since the scheduler is the bottleneck of the cluster's scalability. The nodeHeartbeat can also be considered, and the corresponding profobuf message filed has existed in NodeHeartbeatResponse. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Priority: Minor It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)