[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

Tao Jie (JIRA) Mon, 21 Dec 2015 00:22:51 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066184#comment-15066184
 ]


Tao Jie commented on YARN-4477:
-------------------------------

[~asuresh], Thanks for your comments!
I have to explain about the modification to the 
testQueueMaxAMShareWithContainerReservation.
This case fails without the modification as:
{code}
java.lang.AssertionError: Application8's AM resource shouldn't be updated 
expected:<0> but was:<1024>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueMaxAMShareWithContainerReservation(TestFairScheduler.java:4410)
{code}
Actually I am not quite familiar with the logic of this case. As the 
description in comment:
{code}
   * 8. Remove APP3.
   * 9. APP6 failed to reserve a 10G container on Node1 due to AMShare limit.
   * 10. APP7 allocated 1G on Node1.
   * 11. Remove APP4 and APP5.
   * 12. APP6 reserved 10G on Node1 and Node2.
   * 13. APP8 failed to allocate a 1G container on Node1 and Node2 because
   *     APP6 reserved Node1 and Node2.
{code}
When I debugged this case, I found out that APP6 was not actually reserved on 
Node1 because of the limitation check. I have no idea why this case works 
before this patch. I think APP6 should be expected to be reserved, so I modify 
the configuration for this test case and ensure APP6 actually be reserved here.
And comments will be refine as your suggestion. Thank you!

> FairScheduler: encounter infinite loop in attemptScheduling
> -----------------------------------------------------------
>
>                 Key: YARN-4477
>                 URL: https://issues.apache.org/jira/browse/YARN-4477
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>            Reporter: Tao Jie
>            Assignee: Tao Jie
>         Attachments: YARN-4477.001.patch, YARN-4477.002.patch, 
> YARN-4477.003.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>       LOG.info("Making reservation: node=" + node.getNodeName() +
>               " app_id=" + getApplicationId());
>       if (!alreadyReserved) {
>         getMetrics().reserveResource(getUser(), container.getResource());
>         RMContainer rmContainer =
>                 super.reserve(node, priority, null, container);
>         node.reserveResource(this, priority, rmContainer);
>         setReservation(node);
>       } else {
>         RMContainer rmContainer = node.getReservedContainer();
>         super.reserve(node, priority, rmContainer, container);
>         node.reserveResource(this, priority, rmContainer);
>         setReservation(node);
>       }
>     }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>       while (node.getReservedContainer() == null) {
>         boolean assignedContainer = false;
>         if (!queueMgr.getRootQueue().assignContainer(node).equals(
>             Resources.none())) {
>           assignedContainers++;
>           assignedContainer = true;
>           
>         }
>         
>         if (!assignedContainer) { break; }
>         if (!assignMultiple) { break; }
>         if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>       }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

Reply via email to