[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used
[ https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126100#comment-17126100 ] Eric Payne edited comment on YARN-10283 at 6/4/20, 7:44 PM: -[~Jim_Brennan] it looks like the patch applies cleanly all the way back to 2.10.- Sorry, this was placed in the wrong JIRA. was (Author: eepayne): -[~Jim_Brennan], it looks like the patch applies cleanly all the way back to 2.10.] Sorry, this was placed in the wrong JIRA. > Capacity Scheduler: starvation occurs if a higher priority queue is full and > node labels are used > - > > Key: YARN-10283 > URL: https://issues.apache.org/jira/browse/YARN-10283 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch, > YARN-10283-ReproTest2.patch > > > Recently we've been investigating a scenario where applications submitted to > a lower priority queue could not get scheduled because a higher priority > queue in the same hierarchy could now satisfy the allocation request. Both > queue belonged to the same partition. > If we disabled node labels, the problem disappeared. > The problem is that {{RegularContainerAllocator}} always allocated a > container for the request, even if it should not have. > *Example:* > * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node) > * Partition "shared" was created with 2 nodes > * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were > added to the partition > * Both queues have a limit of > * Using DominantResourceCalculator > Setup: > Submit distributed shell application to highprio with switches > "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per > container. > Chain of events: > 1. Queue is filled with contaners until it reaches usage vCores:5> > 2. A node update event is pushed to CS from a node which is part of the > partition > 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller > than the current limit resource > 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an > allocated container for > 4. But we can't commit the resource request because we would have 9 vcores in > total, violating the limit. > The problem is that we always try to assign container for the same > application in each heartbeat from "highprio". Applications in "lowprio" > cannot make progress. > *Problem:* > {{RegularContainerAllocator.assignContainer()}} does not handle this case > well. We only reject allocation if this condition is satisfied: > {noformat} > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > {noformat} > But if we have node labels, we enter a different code path and succeed with > the allocation if there's room for a container. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used
[ https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126100#comment-17126100 ] Eric Payne edited comment on YARN-10283 at 6/4/20, 7:43 PM: -[~Jim_Brennan], it looks like the patch applies cleanly all the way back to 2.10.] Sorry, this was placed in the wrong JIRA. was (Author: eepayne): [~Jim_Brennan], it looks like the patch applies cleanly all the way back to 2.10. > Capacity Scheduler: starvation occurs if a higher priority queue is full and > node labels are used > - > > Key: YARN-10283 > URL: https://issues.apache.org/jira/browse/YARN-10283 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch, > YARN-10283-ReproTest2.patch > > > Recently we've been investigating a scenario where applications submitted to > a lower priority queue could not get scheduled because a higher priority > queue in the same hierarchy could now satisfy the allocation request. Both > queue belonged to the same partition. > If we disabled node labels, the problem disappeared. > The problem is that {{RegularContainerAllocator}} always allocated a > container for the request, even if it should not have. > *Example:* > * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node) > * Partition "shared" was created with 2 nodes > * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were > added to the partition > * Both queues have a limit of > * Using DominantResourceCalculator > Setup: > Submit distributed shell application to highprio with switches > "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per > container. > Chain of events: > 1. Queue is filled with contaners until it reaches usage vCores:5> > 2. A node update event is pushed to CS from a node which is part of the > partition > 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller > than the current limit resource > 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an > allocated container for > 4. But we can't commit the resource request because we would have 9 vcores in > total, violating the limit. > The problem is that we always try to assign container for the same > application in each heartbeat from "highprio". Applications in "lowprio" > cannot make progress. > *Problem:* > {{RegularContainerAllocator.assignContainer()}} does not handle this case > well. We only reject allocation if this condition is satisfied: > {noformat} > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > {noformat} > But if we have node labels, we enter a different code path and succeed with > the allocation if there's room for a container. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used
[ https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114298#comment-17114298 ] Peter Bacsko edited comment on YARN-10283 at 5/22/20, 6:06 PM: --- I uploaded a new repro test patch. I was able to prove my theory. If there are no node labels, the same allocation errors occur if reservationsContinueLooking == false AND minimum-allocation-mb == 512. However, if you increase minimum-allocation-mb to 1024 then we are saved becase {{checkHeadroom()}} returns with false and no allocation occurs. This can be reproduced with the new test case. was (Author: pbacsko): I uploaded a new repro test patch. I was able to prove my theory. If there are no node labels, the same occurs if reservationsContinueLooking == false AND minimum-allocation-mb == 512. However, if you increase minimum-allocation-mb to 1024 then we are saved becase {{checkHeadroom()}} returns with false and no allocation occurs. This can be reproduced with the new test case. > Capacity Scheduler: starvation occurs if a higher priority queue is full and > node labels are used > - > > Key: YARN-10283 > URL: https://issues.apache.org/jira/browse/YARN-10283 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch, > YARN-10283-ReproTest2.patch > > > Recently we've been investigating a scenario where applications submitted to > a lower priority queue could not get scheduled because a higher priority > queue in the same hierarchy could now satisfy the allocation request. Both > queue belonged to the same partition. > If we disabled node labels, the problem disappeared. > The problem is that {{RegularContainerAllocator}} always allocated a > container for the request, even if it should not have. > *Example:* > * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node) > * Partition "shared" was created with 2 nodes > * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were > added to the partition > * Both queues have a limit of > * Using DominantResourceCalculator > Setup: > Submit distributed shell application to highprio with switches > "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per > container. > Chain of events: > 1. Queue is filled with contaners until it reaches usage vCores:5> > 2. A node update event is pushed to CS from a node which is part of the > partition > 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller > than the current limit resource > 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an > allocated container for > 4. But we can't commit the resource request because we would have 9 vcores in > total, violating the limit. > The problem is that we always try to assign container for the same > application in each heartbeat from "highprio". Applications in "lowprio" > cannot make progress. > *Problem:* > {{RegularContainerAllocator.assignContainer()}} does not handle this case > well. We only reject allocation if this condition is satisfied: > {noformat} > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > {noformat} > But if we have node labels, we enter a different code path and succeed with > the allocation if there's room for a container. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used
[ https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113342#comment-17113342 ] Tarun Parimi edited comment on YARN-10283 at 5/21/20, 4:31 PM: --- Thanks [~pbacsko] for the repro test patch. The POC patch changes the behavior to include partitions while doing {{reservationsContinueLooking}} in RegularContainerAllocator.java . Similar conditions to check for nodelabels is present in several places such as AbstractCSQueue.java since {{reservationsContinueLooking}} was implemented only for non node label scenario. Ideally we will have to consider fixing YARN-9903 in this scenario. was (Author: tarunparimi): Thanks for the repro test patch. The POC patch changes the behavior to include partitions while doing {{reservationsContinueLooking}} in RegularContainerAllocator.java . Similar conditions to check for nodelabels is present in several places such as AbstractCSQueue.java since {{reservationsContinueLooking}} was implemented only for non node label scenario. Ideally we will have to consider fixing YARN-9903 in this scenario. > Capacity Scheduler: starvation occurs if a higher priority queue is full and > node labels are used > - > > Key: YARN-10283 > URL: https://issues.apache.org/jira/browse/YARN-10283 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch > > > Recently we've been investigating a scenario where applications submitted to > a lower priority queue could not get scheduled because a higher priority > queue in the same hierarchy could now satisfy the allocation request. Both > queue belonged to the same partition. > If we disabled node labels, the problem disappeared. > The problem is that {{RegularContainerAllocator}} always allocated a > container for the request, even if it should not have. > *Example:* > * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node) > * Partition "shared" was created with 2 nodes > * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were > added to the partition > * Both queues have a limit of > * Using DominantResourceCalculator > Setup: > Submit distributed shell application to highprio with switches > "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per > container. > Chain of events: > 1. Queue is filled with contaners until it reaches usage vCores:5> > 2. A node update event is pushed to CS from a node which is part of the > partition > 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller > than the current limit resource > 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an > allocated container for > 4. But we can't commit the resource request because we would have 9 vcores in > total, violating the limit. > The problem is that we always try to assign container for the same > application in each heartbeat from "highprio". Applications in "lowprio" > cannot make progress. > *Problem:* > {{RegularContainerAllocator.assignContainer()}} does not handle this case > well. We only reject allocation if this condition is satisfied: > {noformat} > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > {noformat} > But if we have node labels, we enter a different code path and succeed with > the allocation if there's room for a container. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used
[ https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113063#comment-17113063 ] Prabhu Joseph edited comment on YARN-10283 at 5/21/20, 10:55 AM: - [~pbacsko] Can you write a testcase which reproduces the issue, so that it will be easier to understand the issue and validate the fix. was (Author: prabhu joseph): [~pbacsko] Can you write a testcase which reproduces the issue, so that it will be easier to understand the issue. > Capacity Scheduler: starvation occurs if a higher priority queue is full and > node labels are used > - > > Key: YARN-10283 > URL: https://issues.apache.org/jira/browse/YARN-10283 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10283-POC01.patch > > > Recently we've been investigating a scenario where applications submitted to > a lower priority queue could not get scheduled because a higher priority > queue in the same hierarchy could now satisfy the allocation request. Both > queue belonged to the same partition. > If we disabled node labels, the problem disappeared. > The problem is that {{RegularContainerAllocator}} always allocated a > container for the request, even if it should not have. > *Example:* > * Cluster total resources: 3 nodes, 15GB, 24 vcores > * Partition "shared" was created with 2 nodes > * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were > added to the partition > * Both queues have a limit of > * Using DominantResourceCalculator > Setup: > Submit distributed shell application to highprio with switches > "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per > container. > Chain of events: > 1. Queue is filled with contaners until it reaches usage vCores:5> > 2. A node update event is pushed to CS from a node which is part of the > partition > 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller > than the current limit resource > 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an > allocated container for > 4. But we can't commit the resource request because we would have 9 vcores in > total, violating the limit. > The problem is that we always try to assign container for the same > application in each heartbeat from "highprio". Applications in "lowprio" > cannot make progress. > *Problem:* > {{RegularContainerAllocator.assignContainer()}} does not handle this case > well. We only reject allocation if this condition is satisfied: > {noformat} > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > {noformat} > But if we have node labels, we enter a different code path and succeed with > the allocation if there's room for a container. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org