[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550895#comment-15550895 ] Ryan Williams commented on YARN-4477: - Looking at my notes, I think I was being fooled by copious log-spam of "Reservation Exceeds …" messages, which I thought were coming from an infinite loop that was printing that message, but in reality was just a symptom of the existence of some resource-requests that were too large for the RM to satisfy, resulting in it printing a ton of debug messages about it. > FairScheduler: Handle condition which can result in an infinite loop in > attemptScheduling. > -- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch, YARN-4477.004.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550502#comment-15550502 ] Tony Peng commented on YARN-4477: - I'm also getting this problem with assignMultiple=false. [~kasha] [~rdub] what was your offline discussion? > FairScheduler: Handle condition which can result in an infinite loop in > attemptScheduling. > -- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch, YARN-4477.004.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219085#comment-15219085 ] Karthik Kambatla commented on YARN-4477: Worked with [~rdub] offline. Turning off assignMultiple should workaround this issue. > FairScheduler: Handle condition which can result in an infinite loop in > attemptScheduling. > -- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Fix For: 2.8.0 > > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch, YARN-4477.004.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207567#comment-15207567 ] Ryan Williams commented on YARN-4477: - {quote} if multiple assign is enabled and maxAssign is unlimited, this while loop would never break. {quote} I am seeing this with multiple assign disabled; is that known to be possible? Running 2.6.0-cdh5.5.1. > FairScheduler: Handle condition which can result in an infinite loop in > attemptScheduling. > -- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Fix For: 2.8.0 > > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch, YARN-4477.004.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067629#comment-15067629 ] Arun Suresh commented on YARN-4477: --- Thanks for updating the patch [~Tao Jie]. Verified that the tests run fine locally.. committed to trunk, branch-2 and branch-2.8 > FairScheduler: Handle condition which can result in an infinite loop in > attemptScheduling. > -- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Fix For: 2.8.0 > > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch, YARN-4477.004.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067646#comment-15067646 ] Hudson commented on YARN-4477: -- FAILURE: Integrated in Hadoop-trunk-Commit #9012 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9012/]) YARN-4477. FairScheduler: Handle condition which can result in an (arun suresh: rev e88422df45550f788ae8dd73aec84bde28012aeb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java > FairScheduler: Handle condition which can result in an infinite loop in > attemptScheduling. > -- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Fix For: 2.8.0 > > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch, YARN-4477.004.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)