[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.

2016-10-05 Thread Ryan Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550895#comment-15550895
 ] 

Ryan Williams commented on YARN-4477:
-

Looking at my notes, I think I was being fooled by copious log-spam of 
"Reservation Exceeds …" messages, which I thought were coming from an infinite 
loop that was printing that message, but in reality was just a symptom of the 
existence of some resource-requests that were too large for the RM to satisfy, 
resulting in it printing a ton of debug messages about it.

> FairScheduler: Handle condition which can result in an infinite loop in 
> attemptScheduling.
> --
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4477.001.patch, YARN-4477.002.patch, 
> YARN-4477.003.patch, YARN-4477.004.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.

2016-10-05 Thread Tony Peng (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550502#comment-15550502
 ] 

Tony Peng commented on YARN-4477:
-

I'm also getting this problem with assignMultiple=false. [~kasha] [~rdub] what 
was your offline discussion?

> FairScheduler: Handle condition which can result in an infinite loop in 
> attemptScheduling.
> --
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4477.001.patch, YARN-4477.002.patch, 
> YARN-4477.003.patch, YARN-4477.004.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.

2016-03-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219085#comment-15219085
 ] 

Karthik Kambatla commented on YARN-4477:


Worked with [~rdub] offline. Turning off assignMultiple should workaround this 
issue. 

> FairScheduler: Handle condition which can result in an infinite loop in 
> attemptScheduling.
> --
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 2.8.0
>
> Attachments: YARN-4477.001.patch, YARN-4477.002.patch, 
> YARN-4477.003.patch, YARN-4477.004.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.

2016-03-22 Thread Ryan Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207567#comment-15207567
 ] 

Ryan Williams commented on YARN-4477:
-

{quote} if multiple assign is enabled and maxAssign is unlimited, this while 
loop would never break. {quote}

I am seeing this with multiple assign disabled; is that known to be possible? 
Running 2.6.0-cdh5.5.1.

> FairScheduler: Handle condition which can result in an infinite loop in 
> attemptScheduling.
> --
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 2.8.0
>
> Attachments: YARN-4477.001.patch, YARN-4477.002.patch, 
> YARN-4477.003.patch, YARN-4477.004.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.

2015-12-21 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067629#comment-15067629
 ] 

Arun Suresh commented on YARN-4477:
---

Thanks for updating the patch [~Tao Jie].
Verified that the tests run fine locally.. 

committed to trunk, branch-2 and branch-2.8

> FairScheduler: Handle condition which can result in an infinite loop in 
> attemptScheduling.
> --
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 2.8.0
>
> Attachments: YARN-4477.001.patch, YARN-4477.002.patch, 
> YARN-4477.003.patch, YARN-4477.004.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4477) FairScheduler: Handle condition which can result in an infinite loop in attemptScheduling.

2015-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067646#comment-15067646
 ] 

Hudson commented on YARN-4477:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9012 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9012/])
YARN-4477. FairScheduler: Handle condition which can result in an (arun suresh: 
rev e88422df45550f788ae8dd73aec84bde28012aeb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


> FairScheduler: Handle condition which can result in an infinite loop in 
> attemptScheduling.
> --
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 2.8.0
>
> Attachments: YARN-4477.001.patch, YARN-4477.002.patch, 
> YARN-4477.003.patch, YARN-4477.004.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)