[jira] [Assigned] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned YARN-3635: Assignee: Sandy Ryza (was: Wangda Tan) Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Sandy Ryza Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-3635: - Assignee: Tan, Wangda (was: Sandy Ryza) Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Tan, Wangda Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629141#comment-14629141 ] Sandy Ryza commented on YARN-3635: -- BTW I got all this from QueuePlacementPolicy and QueuePlacementRule, which are pretty quick reads if you want to take a look. Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Tan, Wangda Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629139#comment-14629139 ] Sandy Ryza commented on YARN-3635: -- [~leftnoteasy], apologies for this quick drive-by review - I am currently traveling. The JIRA appears to be lacking a design-doc and I wasn't able to find documentation in the patch. The patch should ultimately include some detailed documentation, but I don't want to ask this of you before OKing the approach. In light of this, a few questions: * What steps are required for the Fair Scheduler to integrate with this? * Is a common way of configuration proposed? * How does this differ from the current Fair Scheduler model? To summarize: ** The FS model consists of a sequence of placement rules that the app is passed through. ** Each placement rule gets the chance to assign the app to a queue, reject the app, or pass. If it passes, the next rule gets a chance. ** A placement rule can base its decision on: *** The submitting user. *** The set of groups the submitting user belongs to. *** The queue requested in the app submission. *** A set of configuration options that are specific to the rule. *** The set of queues given in the Fair Scheduler configuration. ** Rules are marked as terminal if they will never pass. This helps to avoid misconfigurations where users place rules after terminal rules. ** Rules have a create attribute which determines whether they can create a new queue or whether they must assign to existing queues. ** Currently the set of placement rules is limited to what's implemented in YARN. I.e. there's no public pluggable rule support. I noticed from Vinod's comment that this patch follows a similar structure. Are there places where my summary would not describe what's going on in this patch? Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Tan, Wangda Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623047#comment-14623047 ] Sandy Ryza commented on YARN-3866: -- Hi [~jianhe]. Most application writers should be using AMRMClient, so not dealing with this interface directly. That said, given that they are separate data types, I think two different methods would be preferable. AM-RM protocol changes to support container resizing Key: YARN-3866 URL: https://issues.apache.org/jira/browse/YARN-3866 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3866.1.patch, YARN-3866.2.patch YARN-1447 and YARN-1448 are outdated. This ticket deals with AM-RM Protocol changes to support container resize according to the latest design in YARN-1197. 1) Add increase/decrease requests in AllocateRequest 2) Get approved increase/decrease requests from RM in AllocateResponse 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591055#comment-14591055 ] Sandy Ryza commented on YARN-1197: -- The latest proposal makes sense to me as well. Thanks [~wangda] and [~mding]! Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588478#comment-14588478 ] Sandy Ryza commented on YARN-1197: -- bq. I think this assumes cluster is quite idle, I understand the low latency could be achieved, but it's not guaranteed since we don't support oversubscribing, etc. If the cluster is fully contended we certainly won't get this performance. But as long as there is a decent chunk of space, which is common in many settings, we can. The cluster doesn't need to be fully idle by any means. More broadly, just because YARN is not good at hitting sub-second latencies doesn't mean that it isn't a design goal. I strongly oppose any argument that uses the current slowness of YARN as a justification for why we should make architectural decisions that could compromise latencies. That said, I still don't have a strong grasp on the kind of complexity we're introducing in the AM, so would like to try to understand that before arguing against you further. Is the main problem we're grappling still the one Meng brought up here: https://issues.apache.org/jira/browse/YARN-1197?focusedCommentId=14556803page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14556803? I.e. that an AM can receive an increase from the RM, then issue a decrease to the NM, and then use its increase to get resources it doesn't deserve? Or is the idea that, even if we didn't have this JIRA, NMClient is too complicated, and we'd like to reduce that? Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586249#comment-14586249 ] Sandy Ryza commented on YARN-1197: -- Sorry, I've been quiet here for a while, but I'd be concerned about a design that requires going through the ResourceManager for decreases. If I understand correctly, this would be considerable hit to performance, which could be prohibitive for frameworks like Spark that might use container-resizing for allocating per-task resources. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586687#comment-14586687 ] Sandy Ryza commented on YARN-1197: -- bq. Going through RM directly is better as the RM will immediately know that the resource is available for future allocations Is the idea that the RM would make allocations using the space before receiving acknowledgement from the NodeManager that it has resized the container (adjusted cgroups)? Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587127#comment-14587127 ] Sandy Ryza commented on YARN-1197: -- Option (a) can occur in the low hundreds of milliseconds if the cluster is tuned properly, independent of cluster size. 1) Submit increase request to RM. Poll RM 100 milliseconds later after continuous scheduling thread has run in order to pick up the increase token. 2) Send increase token to NM. Why does the AM need to poll the NM about increase status before taking action? Does the NM need to do anything other than update its tracking of the resources allotted to the container? Also, it's not unlikely that schedulers will be improved to return the increase token on the same heartbeat that it's requested. So this could all happen in 2 RPCs + a scheduler decision, and no additional wait time. Anything more than this is probably prohibitively expensive for a framework like Spark to submit an increase request before running each task. Would option (b) ever be able to achieve this kind of latency? Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587072#comment-14587072 ] Sandy Ryza commented on YARN-1197: -- bq. RM still needs to wait for an acknowledgement from NM to confirm that the increase is done before sending out response to AM. This will take two heartbeat cycles, but this is not much worse than giving out a token to AM first, and then letting AM initiating the increase. I would argue that waiting for an NM-RM heartbeat is much worse than waiting for an AM-RM heartbeat. With continuous scheduling, the RM can make decisions in millisecond time, and the AM can regulate its heartbeats according to the application's needs to get fast responses. If an NM-RM heartbeat is involved, the application is at the mercy of the cluster settings, which should be in the multi-second range for large clusters. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587067#comment-14587067 ] Sandy Ryza commented on YARN-1197: -- Is my understanding correct that the broader plan is to move stopping containers out of the AM-NM protocol? Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587168#comment-14587168 ] Sandy Ryza commented on YARN-1197: -- bq. If you consider all now/future optimizations, such as continous-scheduling / scheduler make decision at same AM-RM heart-beat. (b) needs one more NM-RM heart-beat interval. I agree with you, it could be hundreds of milli-seconds (a) vs. multi-seconds (b). when the cluster is idle. To clarify: with proper tuning, we can currently get low hundreds of milliseconds without adding any new scheduler features. With the new scheduler feature I'm imagining, we'd only be limited by the RPC + scheduler time, so we could get 10s of milliseconds with proper tuning. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587174#comment-14587174 ] Sandy Ryza commented on YARN-1197: -- Regarding complexity in the AM, the NMClient utility so far has been an API that's fairly easy for app developers to interact with. I've used it more than once and had no issues. Would we not be able to handle most of the additional complexity behind it? Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557912#comment-14557912 ] Sandy Ryza commented on YARN-314: - Do we have applications that need this capability? Schedulers should allow resource requests of different sizes at the same priority and location -- Key: YARN-314 URL: https://issues.apache.org/jira/browse/YARN-314 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Karthik Kambatla Attachments: yarn-314-prelim.patch Currently, resource requests for the same container and locality are expected to all be the same size. While it it doesn't look like it's needed for apps currently, and can be circumvented by specifying different priorities if absolutely necessary, it seems to me that the ability to request containers with different resource requirements at the same priority level should be there for the future and for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542897#comment-14542897 ] Sandy Ryza commented on YARN-3633: -- Another thought is that we could say the max AM share only applies after first AM. With Fair Scheduler, cluster can logjam when there are too many queues -- Key: YARN-3633 URL: https://issues.apache.org/jira/browse/YARN-3633 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: Rohit Agarwal Priority: Critical It's possible to logjam a cluster by submitting many applications at once in different queues. For example, let's say there is a cluster with 20GB of total memory. Let's say 4 users submit applications at the same time. The fair share of each queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the cluster logjams. Nothing gets scheduled even when 20GB of resources are available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-5) Add support for FifoScheduler to schedule CPU along with memory.
[ https://issues.apache.org/jira/browse/YARN-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535008#comment-14535008 ] Sandy Ryza commented on YARN-5: --- +1 to Vinod's point Add support for FifoScheduler to schedule CPU along with memory. Key: YARN-5 URL: https://issues.apache.org/jira/browse/YARN-5 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Arun C Murthy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU
[ https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-810: Assignee: (was: Sandy Ryza) Support CGroup ceiling enforcement on CPU - Key: YARN-810 URL: https://issues.apache.org/jira/browse/YARN-810 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.1.0-beta, 2.0.5-alpha Reporter: Chris Riccomini Labels: BB2015-05-TBR Attachments: YARN-810-3.patch, YARN-810-4.patch, YARN-810-5.patch, YARN-810-6.patch, YARN-810.patch, YARN-810.patch Problem statement: YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. Containers are then allowed to request vcores between the minimum and maximum defined in the yarn-site.xml. In the case where a single-threaded container requests 1 vcore, with a pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of the core it's using, provided that no other container is also using it. This happens, even though the only guarantee that YARN/CGroups is making is that the container will get at least 1/4th of the core. If a second container then comes along, the second container can take resources from the first, provided that the first container is still getting at least its fair share (1/4th). There are certain cases where this is desirable. There are also certain cases where it might be desirable to have a hard limit on CPU usage, and not allow the process to go above the specified resource requirement, even if it's available. Here's an RFC that describes the problem in more detail: http://lwn.net/Articles/336127/ Solution: As it happens, when CFS is used in combination with CGroups, you can enforce a ceiling using two files in cgroups: {noformat} cpu.cfs_quota_us cpu.cfs_period_us {noformat} The usage of these two files is documented in more detail here: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html Testing: I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, it behaves as described above (it is a soft cap, and allows containers to use more than they asked for). I then tested CFS CPU quotas manually with YARN. First, you can see that CFS is in use in the CGroup, based on the file names: {noformat} [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/ total 0 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks [criccomi@eat1-qa464 ~]$ sudo -u app cat /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us 10 [criccomi@eat1-qa464 ~]$ sudo -u app cat /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us -1 {noformat} Oddly, it appears that the cfs_period_us is set to .1s, not 1s. We can place processes in hard limits. I have process 4370 running YARN container container_1371141151815_0003_01_03 on a host. By default, it's running at ~300% cpu usage. {noformat} CPU 4370 criccomi 20 0 1157m 551m 14m S 240.3 0.8 87:10.91 ... {noformat} When I set the CFS quote: {noformat} echo 1000 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us CPU 4370 criccomi 20 0 1157m 563m 14m S 1.0 0.8 90:08.39 ... {noformat} It drops to 1% usage, and you can see the box has room to spare: {noformat} Cpu(s): 2.4%us, 1.0%sy, 0.0%ni, 92.2%id, 4.2%wa, 0.0%hi, 0.1%si, 0.0%st {noformat} Turning the quota back to -1: {noformat} echo -1 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us {noformat} Burns the cores again: {noformat} Cpu(s): 11.1%us, 1.7%sy, 0.0%ni, 83.9%id, 3.1%wa, 0.0%hi, 0.2%si, 0.0%st CPU 4370 criccomi 20 0 1157m 563m 14m S 253.9 0.8 89:32.31 ... {noformat} On my dev box, I was testing CGroups by running a python process eight times, to burn through all the cores, since it was doing as described above (giving extra CPU to the process, even with a cpu.shares limit). Toggling the cfs_quota_us seems to enforce a hard limit. Implementation: What do you guys
[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies
[ https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518099#comment-14518099 ] Sandy Ryza commented on YARN-3485: -- It looks like the patch computes the headroom as min(cluster total - cluster consumed, queue max resource). Do we not want it to be min(cluster total - cluster consumed, queue max resource - queue consumed)? FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies Key: YARN-3485 URL: https://issues.apache.org/jira/browse/YARN-3485 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-3485-1.patch, yarn-3485-prelim.patch FairScheduler's headroom calculations consider the fairshare and cluster-available-resources, and the fairshare has maxResources. However, for Fifo and Fairshare policies, the fairshare is used only for memory and not cpu. So, the scheduler ends up showing a higher headroom than is actually available. This could lead to applications waiting for resources far longer than then intend to. e.g. MAPREDUCE-6302. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3485) FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies
[ https://issues.apache.org/jira/browse/YARN-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518222#comment-14518222 ] Sandy Ryza commented on YARN-3485: -- One nit: {code} +return Math.min( Math.min(value1, value2), value3); {code} has an extra space. Otherwise +1. FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies Key: YARN-3485 URL: https://issues.apache.org/jira/browse/YARN-3485 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-3485-1.patch, yarn-3485-2.patch, yarn-3485-prelim.patch FairScheduler's headroom calculations consider the fairshare and cluster-available-resources, and the fairshare has maxResources. However, for Fifo and Fairshare policies, the fairshare is used only for memory and not cpu. So, the scheduler ends up showing a higher headroom than is actually available. This could lead to applications waiting for resources far longer than then intend to. e.g. MAPREDUCE-6302. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-3415: - Summary: Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue (was: Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue -- Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Attachments: YARN-3415.000.patch, YARN-3415.001.patch, YARN-3415.002.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391218#comment-14391218 ] Sandy Ryza commented on YARN-3415: -- +1 Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue - Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Attachments: YARN-3415.000.patch, YARN-3415.001.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391624#comment-14391624 ] Sandy Ryza commented on YARN-3415: -- [~ragarwal] did you have any more comments before I commit this? Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue - Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Attachments: YARN-3415.000.patch, YARN-3415.001.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387425#comment-14387425 ] Sandy Ryza commented on YARN-3415: -- This looks mostly reasonable. A few comments: * In FSAppAttempt, can we change the If this container is used to run AM comment to If not running unmanaged, the first container we allocate is always the AM. Update the leaf queue's AM usage? * The four lines of comment in FSLeafQueue could be reduced to If isAMRunning is true, we're no running an unmanaged AM. * Would it make sense to move the call to setAMResource that's currently in FairScheduler next to the call to getQueue().addAMResourceUsage() so that the queue and attempt resource usage get updated at the same time? Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue - Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Attachments: YARN-3415.000.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385347#comment-14385347 ] Sandy Ryza commented on YARN-3415: -- Thanks for filing this [~ragarwal] and for taking this up [~zxu]. This seems like a fairly serious issue. Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue - Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-3415: - Target Version/s: 2.7.0, 2.6.1 Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue - Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-3415: - Priority: Critical (was: Major) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue - Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2990) FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests
[ https://issues.apache.org/jira/browse/YARN-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310034#comment-14310034 ] Sandy Ryza commented on YARN-2990: -- +1. Sorry for the delay in getting to this. FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests --- Key: YARN-2990 URL: https://issues.apache.org/jira/browse/YARN-2990 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2990-0.patch, yarn-2990-1.patch, yarn-2990-2.patch, yarn-2990-test.patch Looking at the FairScheduler, it appears the node/rack locality delays are used for all requests, even those that are only off-rack. More details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307602#comment-14307602 ] Sandy Ryza commented on YARN-3101: -- +1 FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-3101: - Summary: In Fair Scheduler, fix canceling of reservations for exceeding max share (was: Fix canceling of reservations for exceeding max share) In Fair Scheduler, fix canceling of reservations for exceeding max share Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3101) Fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-3101: - Summary: Fix canceling of reservations for exceeding max share (was: FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it ) Fix canceling of reservations for exceeding max share - Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298887#comment-14298887 ] Sandy Ryza commented on YARN-3101: -- [~adhoot] is this the same condition that's evaluated when reserving a resource in the first place? I.e. might we ever make a reservation and then immediately end up canceling it? Also, I believe [~l201514] is correct that reservedAppSchedulable.getResource(reservedPriority))) will not return the right quantity and node.getReservedContainer().getReservedResource() is correct. Last of all, while we're at it, can we rename fitInMaxShare to fitsInMaxShare? FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299428#comment-14299428 ] Sandy Ryza commented on YARN-3101: -- In that case it sounds like the behavior is that we can go one container over the max resources. While this might be worth changing in a separate JIRA, we should maintain that behavior with the reservations. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2990) FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests
[ https://issues.apache.org/jira/browse/YARN-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287871#comment-14287871 ] Sandy Ryza commented on YARN-2990: -- Other than the addition of the anyLocalRequests check core here: {code} + if (offSwitchRequest.getNumContainers() 0 + (!anyLocalRequests(priority) + || allowedLocality.equals(NodeType.OFF_SWITCH))) { {code} are the other changes core to the fix? If not, given that this is touchy code, can we leave things the way they are or make the changes in a separate cleanup JIRA? Also, a couple nits: * Need some extra indentation in the snippet above * anyLocalRequests is kind of a confusing name for that method, because any often means off-switch when thinking about locality. Maybe hasNodeOrRackRequests. FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests --- Key: YARN-2990 URL: https://issues.apache.org/jira/browse/YARN-2990 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2990-0.patch, yarn-2990-1.patch, yarn-2990-test.patch Looking at the FairScheduler, it appears the node/rack locality delays are used for all requests, even those that are only off-rack. More details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234035#comment-14234035 ] Sandy Ryza commented on YARN-2910: -- Using a CopyOnWriteArrayList would make adding an application an O(n) operation. On many clusters, this happens quite frequently. Acquiring a lock is cheap when there is no contention. if app submissions are frequent, I'd rather slow down requests for queue info than the submissions themselves. Otherwise, the former shouldn't have a large effect on the performance of the latter. FSLeafQueue can throw ConcurrentModificationException - Key: YARN-2910 URL: https://issues.apache.org/jira/browse/YARN-2910 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Attachments: FSLeafQueue_concurrent_exception.txt, YARN-2910.patch The list that maintains the runnable and the non runnable apps are a standard ArrayList but there is no guarantee that it will only be manipulated by one thread in the system. This can lead to the following exception: {noformat} 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) at java.util.ArrayList$Itr.next(ArrayList.java:831) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516) {noformat} Full stack trace in the attached file. We should guard against that by using a thread safe version from java.util.concurrent.CopyOnWriteArrayList -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2910: - Assignee: Wilfred Spiegelenburg FSLeafQueue can throw ConcurrentModificationException - Key: YARN-2910 URL: https://issues.apache.org/jira/browse/YARN-2910 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Attachments: FSLeafQueue_concurrent_exception.txt, YARN-2910.patch The list that maintains the runnable and the non runnable apps are a standard ArrayList but there is no guarantee that it will only be manipulated by one thread in the system. This can lead to the following exception: 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) at java.util.ArrayList$Itr.next(ArrayList.java:831) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516) Full stack trace in the attached file. We should guard against that by using a thread safe version from java.util.concurrent.CopyOnWriteArrayList -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221513#comment-14221513 ] Sandy Ryza commented on YARN-2669: -- This is looking good. A few comments. Can we add documentation for this behavior in FairScheduler.apt.vm? We should be doing the same conversion for group names, right? {code} + + submitted by user + user + with an illegal queue name ( + + queueName + ). {code} Nit: I think it's better not to surround the queue name with parentheses. {code} +return queueName + . + convertUsername(user); {code} Can we call convertUsername something like cleanUsername to be a little more descriptive? FairScheduler: queueName shouldn't allow periods the allocation.xml --- Key: YARN-2669 URL: https://issues.apache.org/jira/browse/YARN-2669 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, YARN-2669-4.patch For an allocation file like: {noformat} allocations queue name=root.q1 minResources4096mb,4vcores/minResources /queue /allocations {noformat} Users may wish to config minResources for a queue with full path root.q1. However, right now, fair scheduler will treat this configureation for the queue with full name root.root.q1. We need to print out a warning msg to notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221604#comment-14221604 ] Sandy Ryza commented on YARN-2669: -- +1 FairScheduler: queueName shouldn't allow periods the allocation.xml --- Key: YARN-2669 URL: https://issues.apache.org/jira/browse/YARN-2669 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, YARN-2669-4.patch, YARN-2669-5.patch For an allocation file like: {noformat} allocations queue name=root.q1 minResources4096mb,4vcores/minResources /queue /allocations {noformat} Users may wish to config minResources for a queue with full path root.q1. However, right now, fair scheduler will treat this configureation for the queue with full name root.root.q1. We need to print out a warning msg to notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2669) FairScheduler: queue names shouldn't allow periods
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2669: - Summary: FairScheduler: queue names shouldn't allow periods (was: FairScheduler: queueName shouldn't allow periods the allocation.xml) FairScheduler: queue names shouldn't allow periods -- Key: YARN-2669 URL: https://issues.apache.org/jira/browse/YARN-2669 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, YARN-2669-4.patch, YARN-2669-5.patch For an allocation file like: {noformat} allocations queue name=root.q1 minResources4096mb,4vcores/minResources /queue /allocations {noformat} Users may wish to config minResources for a queue with full path root.q1. However, right now, fair scheduler will treat this configureation for the queue with full name root.root.q1. We need to print out a warning msg to notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2669) FairScheduler: queue names shouldn't allow periods
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2669: - Priority: Major (was: Minor) FairScheduler: queue names shouldn't allow periods -- Key: YARN-2669 URL: https://issues.apache.org/jira/browse/YARN-2669 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch, YARN-2669-4.patch, YARN-2669-5.patch For an allocation file like: {noformat} allocations queue name=root.q1 minResources4096mb,4vcores/minResources /queue /allocations {noformat} Users may wish to config minResources for a queue with full path root.q1. However, right now, fair scheduler will treat this configureation for the queue with full name root.root.q1. We need to print out a warning msg to notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4
[ https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212015#comment-14212015 ] Sandy Ryza commented on YARN-2811: -- This looks almost good to go - the last thing is that we should use Resources.fitsIn instead of Resources.lessThanOrEqual(RESOURCE_CALCULATOR...), as the latter will only consider memory. Fair Scheduler is violating max memory settings in 2.4 -- Key: YARN-2811 URL: https://issues.apache.org/jira/browse/YARN-2811 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch, YARN-2811.v6.patch, YARN-2811.v7.patch This has been seen on several queues showing the allocated MB going significantly above the max MB and it appears to have started with the 2.4 upgrade. It could be a regression bug from 2.0 to 2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4
[ https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213045#comment-14213045 ] Sandy Ryza commented on YARN-2811: -- +1 Fair Scheduler is violating max memory settings in 2.4 -- Key: YARN-2811 URL: https://issues.apache.org/jira/browse/YARN-2811 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch, YARN-2811.v6.patch, YARN-2811.v7.patch, YARN-2811.v8.patch, YARN-2811.v9.patch This has been seen on several queues showing the allocated MB going significantly above the max MB and it appears to have started with the 2.4 upgrade. It could be a regression bug from 2.0 to 2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2811) In Fair Scheduler, reservation fulfillments shouldn't ignore max share
[ https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2811: - Summary: In Fair Scheduler, reservation fulfillments shouldn't ignore max share (was: Fair Scheduler is violating max memory settings in 2.4) In Fair Scheduler, reservation fulfillments shouldn't ignore max share -- Key: YARN-2811 URL: https://issues.apache.org/jira/browse/YARN-2811 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch, YARN-2811.v6.patch, YARN-2811.v7.patch, YARN-2811.v8.patch, YARN-2811.v9.patch This has been seen on several queues showing the allocated MB going significantly above the max MB and it appears to have started with the 2.4 upgrade. It could be a regression bug from 2.0 to 2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4
[ https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208403#comment-14208403 ] Sandy Ryza commented on YARN-2811: -- IIUC, this looks like it will check the immediate parent of the queue, but won't go any farther up in the hierarchy. Can fitsIn be given a more descriptive name, like fitsInMaxShares? Last, to avoid code duplication, can the check be moved into this same if statement: {code} if (!reservedAppSchedulable.hasContainerForNode(reservedPriority, node)) { {code} Fair Scheduler is violating max memory settings in 2.4 -- Key: YARN-2811 URL: https://issues.apache.org/jira/browse/YARN-2811 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch This has been seen on several queues showing the allocated MB going significantly above the max MB and it appears to have started with the 2.4 upgrade. It could be a regression bug from 2.0 to 2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4
[ https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203685#comment-14203685 ] Sandy Ryza commented on YARN-2811: -- I just realized an issue with this. maxResources can be set on parent queues as well, so checking the maxResources of the leaf queue that the app is part of is not enough. Sorry for not catching this earlier. A couple more style nitpicks: remember to keep lines close to 80 characters and to put a space after the double slashes that initiate a comment. Also, FSQueue has a getMaxShare method, so you don't need to go to the trouble of getting the name and passing it to the map in the allocation configuration. Fair Scheduler is violating max memory settings in 2.4 -- Key: YARN-2811 URL: https://issues.apache.org/jira/browse/YARN-2811 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, YARN-2811.v3.patch, YARN-2811.v4.patch This has been seen on several queues showing the allocated MB going significantly above the max MB and it appears to have started with the 2.4 upgrade. It could be a regression bug from 2.0 to 2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4
[ https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201884#comment-14201884 ] Sandy Ryza commented on YARN-2811: -- Cool, thanks for the updated patch. Are you able to add a test to verify the behavior? A couple nits: {code} +if (Resources.fitsIn(queue.getResourceUsage(), queue.scheduler +.getAllocationConfiguration().getMaxResources(queue.getName( { {code} Since we're in FairScheduler, can we just access the allocation configuration directly? {code} //Don't hold the reservation if queue reaches its maximum {code} Double slashes should have a space after them. Fair Scheduler is violating max memory settings in 2.4 -- Key: YARN-2811 URL: https://issues.apache.org/jira/browse/YARN-2811 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, YARN-2811.v3.patch This has been seen on several queues showing the allocated MB going significantly above the max MB and it appears to have started with the 2.4 upgrade. It could be a regression bug from 2.0 to 2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4
[ https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199980#comment-14199980 ] Sandy Ryza commented on YARN-2811: -- Thanks for uncovering this [~l201514]. I think that in this case, in addition to not assigning the container, the application should release the reservation so that other apps can get to the node. Fair Scheduler is violating max memory settings in 2.4 -- Key: YARN-2811 URL: https://issues.apache.org/jira/browse/YARN-2811 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch This has been seen on several queues showing the allocated MB going significantly above the max MB and it appears to have started with the 2.4 upgrade. It could be a regression bug from 2.0 to 2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: print out a warning log when users provider a queueName starting with root. in the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165496#comment-14165496 ] Sandy Ryza commented on YARN-2669: -- Might it make more sense to just throw a validation error and crash? Users usually don't look in the RM logs unless something is wrong. FairScheduler: print out a warning log when users provider a queueName starting with root. in the allocation.xml -- Key: YARN-2669 URL: https://issues.apache.org/jira/browse/YARN-2669 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor For an allocation file like: {noformat} allocations queue name=root.q1 minResources4096mb,4vcores/minResources /queue /allocations {noformat} Users may wish to config minResources for a queue with full path root.q1. However, right now, fair scheduler will treat this configureation for the queue with full name root.root.q1. We need to print out a warning msg to notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: print out a warning log when users provider a queueName starting with root. in the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165642#comment-14165642 ] Sandy Ryza commented on YARN-2669: -- We shouldn't allow configured queue names to have periods in them. I believe we already don't accept queues named root, but if we do, we shouldn't. FairScheduler: print out a warning log when users provider a queueName starting with root. in the allocation.xml -- Key: YARN-2669 URL: https://issues.apache.org/jira/browse/YARN-2669 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor For an allocation file like: {noformat} allocations queue name=root.q1 minResources4096mb,4vcores/minResources /queue /allocations {noformat} Users may wish to config minResources for a queue with full path root.q1. However, right now, fair scheduler will treat this configureation for the queue with full name root.root.q1. We need to print out a warning msg to notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158328#comment-14158328 ] Sandy Ryza commented on YARN-2635: -- Parametrized should be spelled Paramet *e* rized. Can you fix that on commit? Otherwise, +1. TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS -- Key: YARN-2635 URL: https://issues.apache.org/jira/browse/YARN-2635 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, yarn-2635-4.patch If we change the scheduler from Capacity Scheduler to Fair Scheduler, the TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2643) Don't create a new DominantResourceCalculator on every FairScheduler.allocate call
Sandy Ryza created YARN-2643: Summary: Don't create a new DominantResourceCalculator on every FairScheduler.allocate call Key: YARN-2643 URL: https://issues.apache.org/jira/browse/YARN-2643 Project: Hadoop YARN Issue Type: Improvement Reporter: Sandy Ryza Assignee: Karthik Kambatla Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157336#comment-14157336 ] Sandy Ryza commented on YARN-1414: -- Awesome with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs - Key: YARN-1414 URL: https://issues.apache.org/jira/browse/YARN-1414 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157624#comment-14157624 ] Sandy Ryza commented on YARN-2635: -- This seems like a good idea. A few stylistic comments. Can we rename RMSchedulerParametrizedTestBase to ParameterizedSchedulerTestBase? The former confuses me a little because it like something that happened, rather than a noun, and RM doesn't seem necessary. Also, Parameterized as spelled in the JUnit class name has three e's. Lastly, can the class include some header comments on what it's doing? {code} + protected void configScheduler(YarnConfiguration conf) throws IOException { +// Configure scheduler {code} Just name the method configureScheduler instead of an abbreviation then comment. {code} + private void configFifoScheduler(YarnConfiguration conf) { +conf.set(YarnConfiguration.RM_SCHEDULER, FifoScheduler.class.getName()); + } + + private void configCapacityScheduler(YarnConfiguration conf) { +conf.set(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class.getName()); + } {code} These are only one line - can we just inline them? {code} + protected YarnConfiguration conf = null; {code} I think better to make this private and expose it through a getConfig method. Running the tests without FIFO seems reasonable to me. One last thought - not sure how feasible this is, but the code might be simpler if we get rid of SchedulerType and just have the parameters be Configuration objects? TestRMRestart should run with all schedulers Key: YARN-2635 URL: https://issues.apache.org/jira/browse/YARN-2635 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch If we change the scheduler from Capacity Scheduler to Fair Scheduler, the TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156020#comment-14156020 ] Sandy Ryza commented on YARN-1414: -- [~jrottinghuis] I will take a look. [~l201514] mind rebasing so that the patch will apply? with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs - Key: YARN-1414 URL: https://issues.apache.org/jira/browse/YARN-1414 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.2.0 Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2596) TestWorkPreservingRMRestart for FairScheduler failed on trunk
[ https://issues.apache.org/jira/browse/YARN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146921#comment-14146921 ] Sandy Ryza commented on YARN-2596: -- +1 pending jenkins TestWorkPreservingRMRestart for FairScheduler failed on trunk - Key: YARN-2596 URL: https://issues.apache.org/jira/browse/YARN-2596 Project: Hadoop YARN Issue Type: Test Reporter: Junping Du Assignee: Karthik Kambatla Attachments: yarn-2596-1.patch As test result from YARN-668, the test failure can be reproduce locally without apply new patch to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
[ https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144424#comment-14144424 ] Sandy Ryza commented on YARN-2252: -- +1 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling Key: YARN-2252 URL: https://issues.apache.org/jira/browse/YARN-2252 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: trunk-win Reporter: Ratandeep Ratti Labels: hadoop2, scheduler, yarn Attachments: YARN-2252-1.patch, yarn-2252-2.patch This test-case is failing sporadically on my machine. I think I have a plausible explanation for this. It seems that when the Scheduler is being asked for resources, the resource requests that are being constructed have no preference for the hosts (nodes). The two mock hosts constructed, both have a memory of 8192 mb. The containers(resources) being requested each require a memory of 1024mb, hence a single node can execute both the resource requests for the application. In the end of the test-case it is being asserted that the containers (resource requests) be executed on different nodes, but since we haven't specified any preferences for nodes when requesting the resources, the scheduler (at times) executes both the containers (requests) on the same node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2555) Effective max-allocation-* should consider biggest node
[ https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135013#comment-14135013 ] Sandy Ryza commented on YARN-2555: -- [~gp.leftnoteasy], this isn't the same as having an NM variable affect the RM conf. Considering the effective max allocation as the biggest node means rejecting requests that won't fit on any node, which I believe is the correct behavior. The issue I had with YARN-2422 was handling at this at the configuration level, rather than properly handling this for heterogeneous clusters. Thanks for pointing that out [~agentvindo.dev] - agreed that this duplicates YARN-56. I think something like the approach outlined here probably makes the most sense for that JIRA. Effective max-allocation-* should consider biggest node --- Key: YARN-2555 URL: https://issues.apache.org/jira/browse/YARN-2555 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla The effective max-allocation-mb should be min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject container requests for resources larger than any node. Today, these requests wait forever. We should do this for all resources and update the effective value on node updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131908#comment-14131908 ] Sandy Ryza commented on YARN-415: - Awesome to see this go in! Capture aggregate memory allocation at the app-level for chargeback --- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 2.5.0 Reporter: Kendall Thrapp Assignee: Eric Payne Fix For: 2.6.0 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.201408150030.txt, YARN-415.201408181938.txt, YARN-415.201408181938.txt, YARN-415.201408212033.txt, YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.201409102216.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2154) FairScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126005#comment-14126005 ] Sandy Ryza commented on YARN-2154: -- I'd like to add another constraint that I've been thinking about into the mix. We don't necessarily need to implement it in this JIRA, but I think it's worth considering how it would affect the approach. A queue should only be able to preempt a container from another queue if every queue between the starved queue and their least common ancestor is starved. This essentially means that we consider preemption and fairness hierarchically. If the marketing and engineering queues are square in terms of resources, starved teams in engineering shouldn't be able to take resources from queues in marketing - they should only be able to preempt from queues within engineering. FairScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request -- Key: YARN-2154 URL: https://issues.apache.org/jira/browse/YARN-2154 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Today, FairScheduler uses a spray-gun approach to preemption. Instead, it should only preempt resources that would satisfy the incoming request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps
[ https://issues.apache.org/jira/browse/YARN-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117800#comment-14117800 ] Sandy Ryza commented on YARN-2486: -- Unfortunately these methods were made public in 2.5, so we can't change their signatures. We can, however, add versions with new names that return longs. FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps Key: YARN-2486 URL: https://issues.apache.org/jira/browse/YARN-2486 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0, 2.4.1 Reporter: Swapnil Daingade Priority: Minor The org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData class defines readOps, largeReadOps, writeOps as int. Also the The org.apache.hadoop.fs.FileSystem.Statistics class has methods like getReadOps(), getLargeReadOps() and getWriteOps() that return int. These int values can overflow if the exceed 2^31-1 showing negative values. It would be nice if these can be changed to long. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116828#comment-14116828 ] Sandy Ryza commented on YARN-2448: -- As Karthik mentioned, the ResourceCalculator is an abstraction used by the Capacity Scheduler that isn't a great fit for the Fair Scheduler, which always enforces CPU limits but can be configured with a different fairness policy at each queue in the hierarchy. If this is necessary, can we provide a narrower interface such as a boolean indicating whether the scheduler considers CPU in its decisions? RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2422) yarn.scheduler.maximum-allocation-mb should not be hard-coded in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101274#comment-14101274 ] Sandy Ryza commented on YARN-2422: -- I think it's weird to have a nodemanager property impact what goes on in the ResourceManager. Using this property would be especially weird on heterogeneous clusters where resources vary from node to node. Preferable would be to, independently of yarn.scheduler.maximum-allocation-mb, make the ResourceManager reject any requests that are larger than the largest node in the cluster. And then default yarn.scheduler.maximum-allocaiton-mb to infinite. yarn.scheduler.maximum-allocation-mb should not be hard-coded in yarn-default.xml - Key: YARN-2422 URL: https://issues.apache.org/jira/browse/YARN-2422 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Gopal V Priority: Minor Attachments: YARN-2422.1.patch Cluster with 40Gb NM refuses to run containers 8Gb. It was finally tracked down to yarn-default.xml hard-coding it to 8Gb. In case of lack of a better override, it should default to - ${yarn.nodemanager.resource.memory-mb} instead of a hard-coded 8Gb. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2430) FairShareComparator: cache the results of getResourceUsage()
[ https://issues.apache.org/jira/browse/YARN-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101732#comment-14101732 ] Sandy Ryza commented on YARN-2430: -- I believe #3 is the best approach as it's more performant than #1 and #2 has correctness issues. I actually implemented it a little while ago as part of YARN-1297 and will try to get that in. FairShareComparator: cache the results of getResourceUsage() Key: YARN-2430 URL: https://issues.apache.org/jira/browse/YARN-2430 Project: Hadoop YARN Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh The compare of FairShareComparator has 3 invocation of getResourceUsage per comparable object. In the case of queues, the implementation of getResourceUsage requires iterating over the apps and adding up their current usage. The compare method can reuse the result of getResourceUsage to reduce the load by third. However, to further reduce the load the result of getResourceUsage can be cached in FSLeafQueue. This would be more efficient since the invocation of compare method on each Comparable object is = 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2420) Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer
[ https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097937#comment-14097937 ] Sandy Ryza commented on YARN-2420: -- Does yarn.scheduler.fair.max.assign satisfy what you're looking for? Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer - Key: YARN-2420 URL: https://issues.apache.org/jira/browse/YARN-2420 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load
[ https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097961#comment-14097961 ] Sandy Ryza commented on YARN-2420: -- Cool. Regarding adjusting maxassign dynamically, my view has been that this isn't needed when continuous scheduling is turned on, and eventually we expect everyone to switch over to continuous scheduling. Thoughts? Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load --- Key: YARN-2420 URL: https://issues.apache.org/jira/browse/YARN-2420 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
[ https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094598#comment-14094598 ] Sandy Ryza commented on YARN-2399: -- +1 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt Key: YARN-2399 URL: https://issues.apache.org/jira/browse/YARN-2399 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.5.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2399-1.patch, yarn-2399-2.patch, yarn-2399-3.patch FairScheduler has two data structures for an application, making the code hard to track. We should merge these for better maintainability in the long-term. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores
[ https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094885#comment-14094885 ] Sandy Ryza commented on YARN-2413: -- The capacity scheduler truncates all vcore requests to 0 if the DominantResourceCalculator is not used. I think in this case it also doesn't make an effort to respect node vcore capacities at all. capacity scheduler will overallocate vcores --- Key: YARN-2413 URL: https://issues.apache.org/jira/browse/YARN-2413 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.2.0 Reporter: Allen Wittenauer Priority: Critical It doesn't appear that the capacity scheduler is properly allocating vcores when making scheduling decisions, which may result in overallocation of CPU resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores
[ https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094899#comment-14094899 ] Sandy Ryza commented on YARN-2413: -- I believe this is the expected behavior (i.e. Capacity Scheduler by default doesn't use vcores in scheduling). capacity scheduler will overallocate vcores --- Key: YARN-2413 URL: https://issues.apache.org/jira/browse/YARN-2413 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.2.0 Reporter: Allen Wittenauer Priority: Critical It doesn't appear that the capacity scheduler is properly allocating vcores when making scheduling decisions, which may result in overallocation of CPU resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2413) capacity scheduler will overallocate vcores
[ https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094903#comment-14094903 ] Sandy Ryza commented on YARN-2413: -- I don't have an opinion on whether we should keep this as the default behavior, just wanted to clear up that it's what's expected. capacity scheduler will overallocate vcores --- Key: YARN-2413 URL: https://issues.apache.org/jira/browse/YARN-2413 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0, 2.2.0 Reporter: Allen Wittenauer Priority: Critical It doesn't appear that the capacity scheduler is properly allocating vcores when making scheduling decisions, which may result in overallocation of CPU resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
[ https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093592#comment-14093592 ] Sandy Ryza commented on YARN-2399: -- I noticed in FSAppAttempt there are some instance variables mixed in with the functions. Not sure if it was like this already, but can we move them up to the top? FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt Key: YARN-2399 URL: https://issues.apache.org/jira/browse/YARN-2399 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.5.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2399-1.patch, yarn-2399-2.patch FairScheduler has two data structures for an application, making the code hard to track. We should merge these for better maintainability in the long-term. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
[ https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093597#comment-14093597 ] Sandy Ryza commented on YARN-2399: -- Also, can we move all the methods that implement methods in Schedulable together? {code} + // TODO (KK): Rename these {code} Rename these? {code} -new ConcurrentHashMapApplicationId,SchedulerApplicationFSSchedulerApp(); +new ConcurrentHashMapApplicationId,SchedulerApplicationFSAppAttempt(); {code} Mind adding a space here after ApplicationId because you're fixing this line anyway? {code} + private FSAppAttempt mockAppSched(long startTime) { +FSAppAttempt schedApp = mock(FSAppAttempt.class); +when(schedApp.getStartTime()).thenReturn(startTime); +return schedApp; } {code} Call this mockAppAttempt? Otherwise, LGTM FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt Key: YARN-2399 URL: https://issues.apache.org/jira/browse/YARN-2399 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.5.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2399-1.patch, yarn-2399-2.patch FairScheduler has two data structures for an application, making the code hard to track. We should merge these for better maintainability in the long-term. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting
[ https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090420#comment-14090420 ] Sandy Ryza commented on YARN-807: - bq. If you think it's a bug, we can resolve it in YARN-2385. bq. We may need to create a Mapqueue-name, app-id in RMContext. It's also worth considering only holding this map for completed applications, so we don't need to keep two maps for running applications. When querying apps by queue, iterating over all apps is inefficient and limiting - Key: YARN-807 URL: https://issues.apache.org/jira/browse/YARN-807 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, YARN-807-4.patch, YARN-807.patch The question which apps are in queue x can be asked via the RM REST APIs, through the ClientRMService, and through the command line. In all these cases, the question is answered by scanning through every RMApp and filtering by the app's queue name. All schedulers maintain a mapping of queues to applications. I think it would make more sense to ask the schedulers which applications are in a given queue. This is what was done in MR1. This would also have the advantage of allowing a parent queue to return all the applications on leaf queues under it, and allow queue name aliases, as in the way that root.default and default refer to the same queue in the fair scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting
[ https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090447#comment-14090447 ] Sandy Ryza commented on YARN-807: - I just remembered a couple reasons why it's important that we go through the scheduler: * *Getting all the apps underneath a parent queue* - the scheduler holds queue hierarchy information that allows us to return applications in all leaf queues underneath a parent queue. * *Alisases* - In the Fair Scheduler, default is shorthand for root.default, so querying on either of these names should return applications in that queue. I'm open to approaches that don't require going through the scheduler, but I think we should make sure they keep supporting these capabilities. When querying apps by queue, iterating over all apps is inefficient and limiting - Key: YARN-807 URL: https://issues.apache.org/jira/browse/YARN-807 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, YARN-807-4.patch, YARN-807.patch The question which apps are in queue x can be asked via the RM REST APIs, through the ClientRMService, and through the command line. In all these cases, the question is answered by scanning through every RMApp and filtering by the app's queue name. All schedulers maintain a mapping of queues to applications. I think it would make more sense to ask the schedulers which applications are in a given queue. This is what was done in MR1. This would also have the advantage of allowing a parent queue to return all the applications on leaf queues under it, and allow queue name aliases, as in the way that root.default and default refer to the same queue in the fair scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance
[ https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089874#comment-14089874 ] Sandy Ryza commented on YARN-2352: -- My only comment is that I think it would make more sense to call these metrics FSOpDurations. Otherwise LGTM. FairScheduler: Collect metrics on duration of critical methods that affect performance -- Key: YARN-2352 URL: https://issues.apache.org/jira/browse/YARN-2352 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: fs-perf-metrics.png, yarn-2352-1.patch, yarn-2352-2.patch, yarn-2352-2.patch, yarn-2352-3.patch, yarn-2352-4.patch We need more metrics for better visibility into FairScheduler performance. At the least, we need to do this for (1) handle node events, (2) update, (3) compute fairshares, (4) preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance
[ https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089880#comment-14089880 ] Sandy Ryza commented on YARN-2352: -- And also - is there a reason we need to change all the clocks to getClock()s? FairScheduler: Collect metrics on duration of critical methods that affect performance -- Key: YARN-2352 URL: https://issues.apache.org/jira/browse/YARN-2352 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: fs-perf-metrics.png, yarn-2352-1.patch, yarn-2352-2.patch, yarn-2352-2.patch, yarn-2352-3.patch, yarn-2352-4.patch We need more metrics for better visibility into FairScheduler performance. At the least, we need to do this for (1) handle node events, (2) update, (3) compute fairshares, (4) preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance
[ https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089997#comment-14089997 ] Sandy Ryza commented on YARN-2352: -- +1 FairScheduler: Collect metrics on duration of critical methods that affect performance -- Key: YARN-2352 URL: https://issues.apache.org/jira/browse/YARN-2352 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: fs-perf-metrics.png, yarn-2352-1.patch, yarn-2352-2.patch, yarn-2352-2.patch, yarn-2352-3.patch, yarn-2352-4.patch, yarn-2352-5.patch We need more metrics for better visibility into FairScheduler performance. At the least, we need to do this for (1) handle node events, (2) update, (3) compute fairshares, (4) preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting
[ https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090172#comment-14090172 ] Sandy Ryza commented on YARN-807: - Hi [~leftnoteasy], I think the expected behavior should be to include both active and pending apps. If that changed with this patch, then I introduced a bug. Perhaps more worryingly, it appears that this patch makes it so that completed apps aren't returned when querying by queue, which I don't think is necessarily desirable behavior. When querying apps by queue, iterating over all apps is inefficient and limiting - Key: YARN-807 URL: https://issues.apache.org/jira/browse/YARN-807 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, YARN-807-4.patch, YARN-807.patch The question which apps are in queue x can be asked via the RM REST APIs, through the ClientRMService, and through the command line. In all these cases, the question is answered by scanning through every RMApp and filtering by the app's queue name. All schedulers maintain a mapping of queues to applications. I think it would make more sense to ask the schedulers which applications are in a given queue. This is what was done in MR1. This would also have the advantage of allowing a parent queue to return all the applications on leaf queues under it, and allow queue name aliases, as in the way that root.default and default refer to the same queue in the fair scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance
[ https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087887#comment-14087887 ] Sandy Ryza commented on YARN-2352: -- IIUC, this patch will only record the duration. If we go that route, I think we should call these metrics lastNodeUpdateDuration etc.. However, would it make sense to go with an approach that records more historical information? For example, RPCMetrics uses a MutableRate to keep stats on the processing time for RPCs, and I think a similar model could work here. Last, is there any need to make the FSPerfMetrics instance static? Right now I think the Fair Scheduler has managed to avoid any mutable static variables. FairScheduler: Collect metrics on duration of critical methods that affect performance -- Key: YARN-2352 URL: https://issues.apache.org/jira/browse/YARN-2352 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: fs-perf-metrics.png, yarn-2352-1.patch, yarn-2352-2.patch, yarn-2352-2.patch We need more metrics for better visibility into FairScheduler performance. At the least, we need to do this for (1) handle node events, (2) update, (3) compute fairshares, (4) preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2367) Make ResourceCalculator configurable for FairScheduler and FifoScheduler like CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved YARN-2367. -- Resolution: Not a Problem Hi Swapnil, The Fair Scheduler supports this through a different interface. Scheduling policies can be configured at any queue level in the hierarchy. In general, the FIFO scheduler lacks most of the advanced functionality of the Fair and Capacity schedulers. My opinion is that achieving parity is a non-goal. If you think this shouldn't be the case, feel free to reopen this JIRA under a name like Support multi-resource scheduling in the FIFO scheduler and we can discuss whether that's worth embarking on. Make ResourceCalculator configurable for FairScheduler and FifoScheduler like CapacityScheduler --- Key: YARN-2367 URL: https://issues.apache.org/jira/browse/YARN-2367 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.2.0, 2.3.0, 2.4.1 Reporter: Swapnil Daingade Priority: Minor The ResourceCalculator used by CapacityScheduler is read from a configuration file entry capacity-scheduler.xml yarn.scheduler.capacity.resource-calculator. This allows for custom implementations that implement the ResourceCalculator interface to be plugged in. It would be nice to have the same functionality in FairScheduler and FifoScheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2328) FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped
[ https://issues.apache.org/jira/browse/YARN-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071361#comment-14071361 ] Sandy Ryza commented on YARN-2328: -- {code} -if (node != null Resources.fitsIn(minimumAllocation, -node.getAvailableResource())) { +if (node != null +Resources.fitsIn(minimumAllocation, node.getAvailableResource())) { {code} This looks unrelated. +1 otherwise. FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped Key: YARN-2328 URL: https://issues.apache.org/jira/browse/YARN-2328 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Attachments: yarn-2328-1.patch FairScheduler threads can use a little cleanup and tests. To begin with, the update and continuous-scheduling threads should extend Thread and handle being interrupted. We should have tests for starting and stopping them as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2313) Livelock can occur in FairScheduler when there are lots of running apps
[ https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2313: - Summary: Livelock can occur in FairScheduler when there are lots of running apps (was: Livelock can occur on FairScheduler when there are lots of running apps) Livelock can occur in FairScheduler when there are lots of running apps --- Key: YARN-2313 URL: https://issues.apache.org/jira/browse/YARN-2313 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, YARN-2313.4.patch, rm-stack-trace.txt Observed livelock on FairScheduler when there are lots entry in queue. After my investigating code, following case can occur: 1. {{update()}} called by UpdateThread takes longer times than UPDATE_INTERVAL(500ms) if there are lots queue. 2. UpdateThread goes busy loop. 3. Other threads(AllocationFileReloader, ResourceManager$SchedulerEventDispatcher) can wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue
[ https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067998#comment-14067998 ] Sandy Ryza commented on YARN-2313: -- Thanks for reporting this [~ozawa]. A couple nits: * The new configuration should be defined in FairSchedulerConfiguration like other fair scheduler props * If I understand correctly, the race described in the findbugs could never actually happen. For code readability, I think it's better to add a findbugs exclude than an unnecessary synchronization. * In the warning, replace use with using * Extra space after DEFAULT_RM_SCHEDULER_FS_UPDATE_INTERVAL_MS Eventually, I think we should try to be smarter about the work that goes on in update(). In most cases, the fair shares will stay the same, or will only change for apps in a particular queue, so we can avoid recomputation. Livelock can occur on FairScheduler when there are lots entry in queue -- Key: YARN-2313 URL: https://issues.apache.org/jira/browse/YARN-2313 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, rm-stack-trace.txt Observed livelock on FairScheduler when there are lots entry in queue. After my investigating code, following case can occur: 1. {{update()}} called by UpdateThread takes longer times than UPDATE_INTERVAL(500ms) if there are lots queue. 2. UpdateThread goes busy loop. 3. Other threads(AllocationFileReloader, ResourceManager$SchedulerEventDispatcher) can wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068017#comment-14068017 ] Sandy Ryza commented on YARN-796: - I'm worried that the proposal is becoming too complex. Can we try to whittle the proposal down to a minimum viable feature? I'm not necessarily opposed to the more advanced parts of it like queue label policies and updating labels on the fly, and the design should aim to make them possible in the future, but I don't think they need to be part of the initial implementation. To me it seems like the essential requirements here are: * A way for nodes to be tagged with labels * A way to make scheduling requests based on these labels I'm also skeptical about the need for adding/removing labels dynamically. Do we have concrete use cases for this? Lastly, as BC and Sunil have pointed out, specifying the labels in the NodeManager confs greatly simplifies configuration when nodes are being added. Are there advantages to a centralized configuration? Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2323) FairShareComparator creates too much Resource object
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068143#comment-14068143 ] Sandy Ryza commented on YARN-2323: -- As it's a static final variable, ONE should be all caps. Otherwise, LGTM. FairShareComparator creates too much Resource object Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2323.patch Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2323) FairShareComparator creates too many Resource objects
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2323: - Summary: FairShareComparator creates too many Resource objects (was: FairShareComparator creates too much Resource object) FairShareComparator creates too many Resource objects - Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2323-2.patch, YARN-2323.patch Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2257) Add user to queue mappings to automatically place users' apps into specific queues
[ https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063749#comment-14063749 ] Sandy Ryza commented on YARN-2257: -- [~wangda] I agree with you that expecting admins to recompile Hadoop is unreasonable. I don't think we would expect admins to add rules. The idea is more to have a small library of rules we provide that fit into a common configuration framework. If we were to add QueuePlacementRule that accepts a list of user-queue, and wanted to express accept the user's queue if they specify it in ApplicationSubmissionContext, otherwise look for a user-queue mapping, if none is found, use the default queue, configuring it according to the current Fair Scheduler format would look something like this: {code} queuePlacementPolicy rule name=specified / rule name=userToQueue mapping name=sally queue=queue1 mapping name=emilio queue=queue2 /rule rule name=default/ /queuePlacementPolicy {code} Add user to queue mappings to automatically place users' apps into specific queues -- Key: YARN-2257 URL: https://issues.apache.org/jira/browse/YARN-2257 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Patrick Liu Assignee: Vinod Kumar Vavilapalli Labels: features Currently, the fair-scheduler supports two modes, default queue or individual queue for each user. Apparently, the default queue is not a good option, because the resources cannot be managed for each user or group. However, individual queue for each user is not good enough. Especially when connecting yarn with hive. There will be increasing hive users in a corporate environment. If we create a queue for a user, the resource management will be hard to maintain. I think the problem can be solved like this: 1. Define user-queue mapping in Fair-Scheduler.xml. Inside each queue, use aclSubmitApps to control user's ability. 2. Each time a user submit an app to yarn, if the user has mapped to a queue, the app will be scheduled to that queue; otherwise, the app will be submitted to default queue. 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2257) Add user to queue mappings to automatically place users' apps into specific queues
[ https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060982#comment-14060982 ] Sandy Ryza commented on YARN-2257: -- Wangda, That policy would work well for a some situations, but I don't think it covers many reasonable scenarios. For example, we might want to ignore the queue that the user defines entirely. Or admins might want to be able to just send apps to queues named with the user's group, instead of specifying the mapping for every group. Or a subdivision in an organization might want to make placements based on group, while a different subdivision using the same cluster might want to make placements based on user. Would you mind taking a look at the Automatically placing applications in queues section and corresponding configuration example in http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html My opinion is that this is a good fit for YARN in general. Add user to queue mappings to automatically place users' apps into specific queues -- Key: YARN-2257 URL: https://issues.apache.org/jira/browse/YARN-2257 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Patrick Liu Assignee: Vinod Kumar Vavilapalli Labels: features Currently, the fair-scheduler supports two modes, default queue or individual queue for each user. Apparently, the default queue is not a good option, because the resources cannot be managed for each user or group. However, individual queue for each user is not good enough. Especially when connecting yarn with hive. There will be increasing hive users in a corporate environment. If we create a queue for a user, the resource management will be hard to maintain. I think the problem can be solved like this: 1. Define user-queue mapping in Fair-Scheduler.xml. Inside each queue, use aclSubmitApps to control user's ability. 2. Each time a user submit an app to yarn, if the user has mapped to a queue, the app will be scheduled to that queue; otherwise, the app will be submitted to default queue. 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059074#comment-14059074 ] Sandy Ryza commented on YARN-796: - +1 on reducing the complexity of the label predicates. We should only use OR if we can think of a few concrete use cases where we would need it. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations
[ https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059292#comment-14059292 ] Sandy Ryza commented on YARN-2274: -- {code} +if (--updatesToSkipForDebug 0) { + updatesToSkipForDebug = UPDATE_DEBUG_FREQUENCY; + if (LOG.isDebugEnabled()) { +LOG.debug(Cluster Capacity: + clusterResource + + Allocations: + rootMetrics.getAllocatedResources() + + Availability: + Resource.newInstance( +rootMetrics.getAvailableMB(), +rootMetrics.getAvailableVirtualCores()) + + Demand: + rootQueue.getDemand()); + } +} {code} Moving the if (LOG.isDebugEnabled) to the outside of this chunk would make it easier for readers who don't care about what's debug logged to realize they can skip this whole segment. If you're OK with that change, +1 and it can be fixed on commit? FairScheduler: Add debug information about cluster capacity, availability and reservations -- Key: YARN-2274 URL: https://issues.apache.org/jira/browse/YARN-2274 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Attachments: yarn-2274-1.patch, yarn-2274-2.patch FairScheduler logs have little information on cluster capacity and availability. Need this information to debug production issues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios
[ https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058009#comment-14058009 ] Sandy Ryza commented on YARN-2026: -- I think Ashwin makes a good point. I think displaying both is reasonable if we present it in a careful way. For example, it might make sense to add tooltips that explain the difference. Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios -- Key: YARN-2026 URL: https://issues.apache.org/jira/browse/YARN-2026 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt Problem1- While using hierarchical queues in fair scheduler,there are few scenarios where we have seen a leaf queue with least fair share can take majority of the cluster and starve a sibling parent queue which has greater weight/fair share and preemption doesn’t kick in to reclaim resources. The root cause seems to be that fair share of a parent queue is distributed to all its children irrespective of whether its an active or an inactive(no apps running) queue. Preemption based on fair share kicks in only if the usage of a queue is less than 50% of its fair share and if it has demands greater than that. When there are many queues under a parent queue(with high fair share),the child queue’s fair share becomes really low. As a result when only few of these child queues have apps running,they reach their *tiny* fair share quickly and preemption doesn’t happen even if other leaf queues(non-sibling) are hogging the cluster. This can be solved by dividing fair share of parent queue only to active child queues. Here is an example describing the problem and proposed solution: root.lowPriorityQueue is a leaf queue with weight 2 root.HighPriorityQueue is parent queue with weight 8 root.HighPriorityQueue has 10 child leaf queues : root.HighPriorityQueue.childQ(1..10) Above config,results in root.HighPriorityQueue having 80% fair share and each of its ten child queue would have 8% fair share. Preemption would happen only if the child queue is 4% (0.5*8=4). Lets say at the moment no apps are running in any of the root.HighPriorityQueue.childQ(1..10) and few apps are running in root.lowPriorityQueue which is taking up 95% of the cluster. Up till this point,the behavior of FS is correct. Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% of the cluster. It would get only the available 5% in the cluster and preemption wouldn't kick in since its above 4%(half fair share).This is bad considering childQ1 is under a highPriority parent queue which has *80% fair share*. Until root.lowPriorityQueue starts relinquishing containers,we would see the following allocation on the scheduler page: *root.lowPriorityQueue = 95%* *root.HighPriorityQueue.childQ1=5%* This can be solved by distributing a parent’s fair share only to active queues. So in the example above,since childQ1 is the only active queue under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 80%. This would cause preemption to reclaim the 30% needed by childQ1 from root.lowPriorityQueue after fairSharePreemptionTimeout seconds. Problem2 - Also note that similar situation can happen between root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck at 5%,until childQ2 starts relinquishing containers. We would like each of childQ1 and childQ2 to get half of root.HighPriorityQueue fair share ie 40%,which would ensure childQ1 gets upto 40% resource if needed through preemption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations
[ https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058085#comment-14058085 ] Sandy Ryza commented on YARN-2274: -- Demanded resources could also be a useful statistic to report. The update thread typically runs twice every second, so it might make sense to 5th update or something to avoid a flood of messages. FairScheduler: Add debug information about cluster capacity, availability and reservations -- Key: YARN-2274 URL: https://issues.apache.org/jira/browse/YARN-2274 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.4.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Attachments: yarn-2274-1.patch FairScheduler logs have little information on cluster capacity and availability. Need this information to debug production issues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2257) Add user to queue mappings to automatically place users' apps into specific queues
[ https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054617#comment-14054617 ] Sandy Ryza commented on YARN-2257: -- To add some background: since MR1, the Fair Scheduler has been able to place apps into queues named with the username or group of the submitter. Last year, YARN-1392 extended this to accept more general policies - essentially any function of (submitter's username, submitter's groups, requested queue), with a structure that allows phrasing the policy in terms of simple rules and fallbacks. This generality is useful because different organizations have a variety of ways they organize their users on their Hadoop clusters. Some administrators want to be able to decide which queue a user's job goes into by placing them into a unix group, while others want a queue for every user, with the option for certain users to intentionally submit their jobs to certain queues. The Fair Scheduler also in particular has need for some added complexity here, because it models users within a queue with their own queues, unlike the Capacity Scheduler, which has a construct for this. We chose to put these queue placement policies in the Fair Scheduler because other schedulers didn't have a precedent for placing apps in queues other than the one requested, but my opinion is that these they could be a useful feature for YARN. If not, we should at least add user-queue mappings in a way that's compatible with more general mappings. Add user to queue mappings to automatically place users' apps into specific queues -- Key: YARN-2257 URL: https://issues.apache.org/jira/browse/YARN-2257 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Patrick Liu Assignee: Vinod Kumar Vavilapalli Labels: features Currently, the fair-scheduler supports two modes, default queue or individual queue for each user. Apparently, the default queue is not a good option, because the resources cannot be managed for each user or group. However, individual queue for each user is not good enough. Especially when connecting yarn with hive. There will be increasing hive users in a corporate environment. If we create a queue for a user, the resource management will be hard to maintain. I think the problem can be solved like this: 1. Define user-queue mapping in Fair-Scheduler.xml. Inside each queue, use aclSubmitApps to control user's ability. 2. Each time a user submit an app to yarn, if the user has mapped to a queue, the app will be scheduled to that queue; otherwise, the app will be submitted to default queue. 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios
[ https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054629#comment-14054629 ] Sandy Ryza commented on YARN-2026: -- I had a conversation with [~kkambatl] about this, and he convinced me that we should turn this on in all cases - i.e. modify FairSharePolicy and DominantResourceFairnessPolicy instead of creating additional policies. Sorry to vacillate on this. Some additional comments on the code: {code} +return this.getNumRunnableApps() 0; {code} {code} + || (sched instanceof FSQueue ((FSQueue) sched).isActive())) { {code} Instead of using instanceof, can we add an isActive method to Schedulable, and always return true for it in AppSchedulable? {code} +out.println( queue name=\childA1\ /); +out.println( queue name=\childA2\ /); +out.println( queue name=\childA3\ /); +out.println( queue name=\childA4\ /); +out.println( queue name=\childA5\ /); +out.println( queue name=\childA6\ /); +out.println( queue name=\childA7\ /); +out.println( queue name=\childA8\ /); {code} Do we need this many children? {code} +out.println(/queue); + +out.println(/allocations); {code} Unnecessary newline {code} + public void testFairShareActiveOnly_ShareResetsToZeroWhenAppsComplete() {code} Take out underscore. {code} + private void setupCluster(int mem, int vCores) throws IOException { {code} Give this method a name that's more descriptive of the kind of configuration it's setting up. {code} + private void setupCluster(int nodeMem) throws IOException { {code} Can this call the setupCluster that takes two arguments? To help with the fight against TestFairScheduler becoming a monstrosity, the tests should go into a new test file. TestFairSchedulerPreemption is a good example of how to do this. {code} +int nodeVcores = 10; {code} Nit: nodeVCores Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios -- Key: YARN-2026 URL: https://issues.apache.org/jira/browse/YARN-2026 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt Problem1- While using hierarchical queues in fair scheduler,there are few scenarios where we have seen a leaf queue with least fair share can take majority of the cluster and starve a sibling parent queue which has greater weight/fair share and preemption doesn’t kick in to reclaim resources. The root cause seems to be that fair share of a parent queue is distributed to all its children irrespective of whether its an active or an inactive(no apps running) queue. Preemption based on fair share kicks in only if the usage of a queue is less than 50% of its fair share and if it has demands greater than that. When there are many queues under a parent queue(with high fair share),the child queue’s fair share becomes really low. As a result when only few of these child queues have apps running,they reach their *tiny* fair share quickly and preemption doesn’t happen even if other leaf queues(non-sibling) are hogging the cluster. This can be solved by dividing fair share of parent queue only to active child queues. Here is an example describing the problem and proposed solution: root.lowPriorityQueue is a leaf queue with weight 2 root.HighPriorityQueue is parent queue with weight 8 root.HighPriorityQueue has 10 child leaf queues : root.HighPriorityQueue.childQ(1..10) Above config,results in root.HighPriorityQueue having 80% fair share and each of its ten child queue would have 8% fair share. Preemption would happen only if the child queue is 4% (0.5*8=4). Lets say at the moment no apps are running in any of the root.HighPriorityQueue.childQ(1..10) and few apps are running in root.lowPriorityQueue which is taking up 95% of the cluster. Up till this point,the behavior of FS is correct. Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% of the cluster. It would get only the available 5% in the cluster and preemption wouldn't kick in since its above 4%(half fair share).This is bad considering childQ1 is under a highPriority parent queue which has *80% fair share*. Until root.lowPriorityQueue starts relinquishing containers,we would see the following allocation on the scheduler page: *root.lowPriorityQueue = 95%* *root.HighPriorityQueue.childQ1=5%* This can be solved by distributing a parent’s fair share only to active queues. So in the example above,since childQ1 is the only active queue under root.HighPriorityQueue, it would get
[jira] [Commented] (YARN-2257) Add user to queue mapping in Fair-Scheduler
[ https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053846#comment-14053846 ] Sandy Ryza commented on YARN-2257: -- Definitely needed. This should be implemented as a QueuePlacementRule. Add user to queue mapping in Fair-Scheduler --- Key: YARN-2257 URL: https://issues.apache.org/jira/browse/YARN-2257 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Patrick Liu Labels: features Currently, the fair-scheduler supports two modes, default queue or individual queue for each user. Apparently, the default queue is not a good option, because the resources cannot be managed for each user or group. However, individual queue for each user is not good enough. Especially when connecting yarn with hive. There will be increasing hive users in a corporate environment. If we create a queue for a user, the resource management will be hard to maintain. I think the problem can be solved like this: 1. Define user-queue mapping in Fair-Scheduler.xml. Inside each queue, use aclSubmitApps to control user's ability. 2. Each time a user submit an app to yarn, if the user has mapped to a queue, the app will be scheduled to that queue; otherwise, the app will be submitted to default queue. 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2250) FairScheduler.findLowestCommonAncestorQueue returns null when queues not identical
[ https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052497#comment-14052497 ] Sandy Ryza commented on YARN-2250: -- +1. A couple lines go over 80 characters and the names are identical comment still applies. Fixing these on commit. FairScheduler.findLowestCommonAncestorQueue returns null when queues not identical -- Key: YARN-2250 URL: https://issues.apache.org/jira/browse/YARN-2250 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Attachments: YARN-2250-1.patch, YARN-2250-2.patch We need to update the queue metrics until to lowest common ancestor of the target and source queue. This method fails to retrieve the right queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2250) Moving apps between queues - FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2250: - Target Version/s: 2.6.0 Fix Version/s: (was: 3.0.0) Moving apps between queues - FairScheduler -- Key: YARN-2250 URL: https://issues.apache.org/jira/browse/YARN-2250 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath We need to update the queue metrics until to lowest common ancestor of the target and source queue. This method fails to retrieve the right queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2250) Moving apps between queues - FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051169#comment-14051169 ] Sandy Ryza commented on YARN-2250: -- Hi Krisztian, Would you mind including an example of a situation where the metrics become off? Moving apps between queues - FairScheduler -- Key: YARN-2250 URL: https://issues.apache.org/jira/browse/YARN-2250 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath We need to update the queue metrics until to lowest common ancestor of the target and source queue. This method fails to retrieve the right queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2250) Moving apps between queues - FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051692#comment-14051692 ] Sandy Ryza commented on YARN-2250: -- I think the bug can be fixed by replacing name1.substring(lastPeriodIndex) with name1.substring(0, lastPeriodIndex). I tried this out and all your tests passed. Moving apps between queues - FairScheduler -- Key: YARN-2250 URL: https://issues.apache.org/jira/browse/YARN-2250 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Attachments: YARN-2250-1.patch We need to update the queue metrics until to lowest common ancestor of the target and source queue. This method fails to retrieve the right queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2250) FairScheduler.findLowestCommonAncestorQueue returns null when queues not identical
[ https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2250: - Summary: FairScheduler.findLowestCommonAncestorQueue returns null when queues not identical (was: Moving apps between queues - FairScheduler) FairScheduler.findLowestCommonAncestorQueue returns null when queues not identical -- Key: YARN-2250 URL: https://issues.apache.org/jira/browse/YARN-2250 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Attachments: YARN-2250-1.patch We need to update the queue metrics until to lowest common ancestor of the target and source queue. This method fails to retrieve the right queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2214) preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness
[ https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044066#comment-14044066 ] Sandy Ryza commented on YARN-2214: -- Makes sense preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness --- Key: YARN-2214 URL: https://issues.apache.org/jira/browse/YARN-2214 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar preemptContainerPreCheck() in FSParentQueue rejects preemption requests if the parent queue is below fair share. This can cause a delay in converging towards fairness when the starved leaf queue and the queue above fairshare belong under a non-root parent queue(ie their least common ancestor is a parent queue which is not root). Here is an example : root.parent has fair share = 80% and usage = 80% root.parent.child1 has fair share =40% usage = 80% root.parent.child2 has fair share=40% usage=0% Now a job is submitted to child2 and the demand is 40%. Preemption will kick in and try to reclaim all the 40% from child1. When it preempts the first container from child1,the usage of root.parent will become 80%, which is less than root.parent's fair share,causing preemption to stop.So only one container gets preempted in this round although the need is a lot more. child2 would eventually get to half its fair share but only after multiple rounds of preemption. Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it only in FSLeafQueue(which is already there). -- This message was sent by Atlassian JIRA (v6.2#6252)