[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518781#comment-14518781 ] Xianyin Xin commented on YARN-2176: --- Sorry [~jlowe], i've made a mistake. What i thought was Fair, where we resort all the apps when we make scheduling. When the number of the running apps is thousands, the time consume for resorting is hundreds of milliseconds. You're right that the overhead in CS is low. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517043#comment-14517043 ] Jason Lowe commented on YARN-2176: -- It is less efficient to lump them all together. As for whether we need to make the optimization, do we know that this overhead is significant? IIRC we're not resorting all applications from scratch each time we allocate but rather only moving individual apps in the sort order as they are added/updated. That's a lg(N) operation which is not going to budge a whole lot when N is moving from hundreds to thousands and especially if we don't invoke the operation very often. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516808#comment-14516808 ] Xianyin Xin commented on YARN-2176: --- Hi [~jlowe] and [~leftnoteasy], in YARN-3361, all running apps are still participating sorting which may be time-consuming when there're thousands of apps. Is it necessary to make any optimization for this? > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514610#comment-14514610 ] Wangda Tan commented on YARN-2176: -- Make sense, just commented on the YARN-3547, it should be able to use the same mechanism of CS to avoid looking into all apps. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514581#comment-14514581 ] Jason Lowe commented on YARN-2176: -- Yes, it appears most of the benefit should be there. It's still iterating over those applications but avoids most of the body of the loop when doing so. There's still the matter of the FairScheduler needing a similar optimization, and we should either address that in this JIRA or YARN-3547. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514506#comment-14514506 ] Wangda Tan commented on YARN-2176: -- [~jlowe], this problem should be resolved in CS after YARN-3361, in LeafQueue: {code} if (!application.hasPendingResourceRequest(resourceCalculator, node.getPartition(), clusterResource, schedulingMode)) { if (LOG.isDebugEnabled()) { LOG.debug("Skip app_attempt=" + application.getApplicationAttemptId() + ", because it doesn't need more resource, schedulingMode=" + schedulingMode.name() + " node-label=" + node.getPartition()); } continue; } {code} To only look into apps has pending requests. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039143#comment-14039143 ] Jason Lowe commented on YARN-2176: -- Ah, yes. AppSchedulingInfo should only be created by the built-in schedulers, so we can just have that expect the new Queue interface that has the activate/deactivate app methods. While we're at it we can remove the knowledge of ActiveUsersManager from AppSchedulingInfo and just have the queues update their own ActiveUsersManager instances when their activate/deactivate methods are called. That will streamline the AppSchedulingInfo code a bit. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038045#comment-14038045 ] Sandy Ryza commented on YARN-2176: -- Can we merge the ActiveUsersManager stuff into an abstract SchedulerLeafQueue class that FSLeafQueue and LeafQueue extend from? AppSchedulingInfo is private / unstable, so we can modify it's constructor to take to take a SchedulerLeafQueue instead of a Queue. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036451#comment-14036451 ] Jason Lowe commented on YARN-2176: -- ActiveUsersManager doesn't have a reference to the leaf queue today, but it's created by the leaf queue, specific to the leaf queue, and therefore trivial for it to have it if necessary. The end result is effectively the same, AppSchedulingInfo->ActiveUsersManager->LeafQueue is not that different than AppSchedulingInfo->LeafQueue->ActiveUsersManager as far as hops go. In many ways ActiveUsersManager is already a callback object to the queue. It's a queue-specific object, created by the queue, that is used to do three things: # Notify that an application is actively requesting resources via the activateApplication method # Notify that an application is no longer requesting via the deactivateApplication method # Obtain the current number of active users in the queue which is really close to the interface we need. I'm not sure why ActiveUsersManager's methods aren't just part of the Queue interface rather than a separate object. The fact that it's tracked in a separate object internally should be an implementation detail of the queue. I originally proposed the ActiveUsersManager override because it would be a cleaner implementation in terms of entities that would need to be modified. AppSchedulingInfo, ActiveUsersManager, Queue, and all the stuff outside of CapacityScheduler all would remain unchanged, and the implementation is localized to the scheduler that needs it. (Actually I think it's localized just to LeafQueue within the CapacityScheduler itself as well.) I'm not excited about the callback approach since it's yet-another-interface and queues have to remember to register or it doesn't work correctly. I'd rather it be more straightforward, where AppSchedulingInfo calls the queue directly to notify it. No init-time callback registration necessary and more straightforward to understand. But we can't change Queue without breaking compatibility (yay interfaces), so that leaves us with either the original proposal (i.e.: leverage ActiveUsersManager as the callback interface), doing a callback registration approach, or some RTTI-like approach (i.e.: deriving a new interface from Queue and having AppSchedulingInfo check if the queue is really an instance of that, sorta like how PreemptableResourceScheduler is handled today for the scheduler interface). If we do go with the callback approach then we can't have the leaf queue register on behalf of the ActiveUsersManager or we risk breaking backwards compatibility. Currently AppSchedulingInfo is expected to update ActiveUsersManager directly, and if we change it to no longer do that but expect a callback to be registered instead, existing queues that fail to register the callback (because they weren't updated along with the change) will fail to get their ActiveUsersManager object updated. Therefore I think we're stuck with AppSchedulingInfo always updating ActiveUsersManager or at best ActiveUsersManager registering the callback itself separately from the queue. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036130#comment-14036130 ] Sandy Ryza commented on YARN-2176: -- Without the ActivationCallback, the ActiveUsersManager would need to call in to the leaf queue, which it currently doesn't even have a reference to. It seems weirder to me to have an edge from the ActiveUsersManager to the leaf queue than to have an edge from the AppSchedulingInfo to the leaf queue - tracing what's going on would require more hops. What do you think about either * Have both the ActiveUsersManager and the leaf queue register for the callback * Have only the leaf queue register for the callback, and then be in charge of notifying the ActiveUsersManager (which it already has a reference to) Sorry to be nitpicky on this pretty small thing - have just ended up confused by this code multiple times and think it's worth getting right. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036093#comment-14036093 ] Jason Lowe commented on YARN-2176: -- Sure, that works if we think that's cleaner. It's a little weird that AppSchedulingInfo is already calling back into an object obtained from the queue to notify of app activation state (i.e.: the ActiveUsersManager instance) and then we'd register a second object from the same queue to receive the same events. IMHO it'd be nice to not have two separate paths to tell the queue about the same thing. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035925#comment-14035925 ] Sandy Ryza commented on YARN-2176: -- Ah, good point, you're right. In that case what do you think about an ActivationCallback with onActiveChanged(boolean active) that the leaf queue can register with the AppSchedulingInfo for? > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035905#comment-14035905 ] Jason Lowe commented on YARN-2176: -- That proposal would work for the deactivate path, but how does it work for the activate case? If the queue is not normally looping over the deactivated apps then it is not going to call hasPendingRequests() on them and we won't ever add it back to the list of active apps to iterate. If we do always loop over the deactivated apps to call this then that sorta defeats a large portion of the optimization. There needs to be more than a predicate function for the leaf queue to call, unless I'm missing something. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035902#comment-14035902 ] Sandy Ryza commented on YARN-2176: -- The Fair Scheduler should probably avoid this as well. Would a derived ActiveUsersManager be necessary? I've always found the degree that AppSchedulingInfo talks to ActiveUsersManager kind of weird. Could we just expose a hasPendingRequests() method in AppSchedulingInfo? The leaf queue would check it after making an allocation and then make any necessary adjustments. I suppose if an application cancels requests, this wouldn't get reflected immediately in the leaf queue's bookkeeping, but the leaf queue could make the adjustment as soon as it observes this, which would lead to equivalent run time. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps
[ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035799#comment-14035799 ] Jason Lowe commented on YARN-2176: -- AppSchedulingInfo is already determining when an app is actively requesting to be able to update the QueueMetrics.activeApplications metric. (It's confusing that LeafQueue also has an activeApplications collection which is actually the applications running not just the ones requesting.) It would be nice to leverage the work already being done by AppSchedulingInfo, which is currently calling the ActiveUsersManager activateApplication and deactivateApplication methods when necessary. CapacityScheduler could potentially have a derived ActiveUsersManager class that in addition notifies the LeafQueue so the queue can track apps requesting and apps not requesting separately. To preserve allocation semantics we'd have to track the original order of the applications so activating an application inserts it into the list of requesting applications in the same relative order to other requesting applications regardless of how many times it's been activated or deactivated. > CapacityScheduler loops over all running applications rather than actively > requesting apps > -- > > Key: YARN-2176 > URL: https://issues.apache.org/jira/browse/YARN-2176 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The capacity scheduler performance is primarily dominated by > LeafQueue.assignContainers, and that currently loops over all applications > that are running in the queue. It would be more efficient if we looped over > just the applications that are actively asking for resources rather than all > applications, as there could be thousands of applications running but only a > few hundred that are currently asking for resources. -- This message was sent by Atlassian JIRA (v6.2#6252)