[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2015-12-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063619#comment-15063619
 ] 

Sunil G commented on YARN-4479:
---

Thanks [~rohithsharma] for raising this. As discussed offline, all running 
attempts during recovery (attempts state *NOT* in final states and *NOT* null) 
could be considered for this list. Such apps can be made activated first just 
to be in sync for the state before recovery. Now also  its possible that we may 
get starvation but it will be same as what it was happening before HA. 

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2015-12-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063610#comment-15063610
 ] 

Rohith Sharma K S commented on YARN-4479:
-

Thinking while recovering the applications, previous running applications 
should be added to separate ordering policy like 
{{pendingOrderingPolicyRecovery}} and use this while activating applications 
prior to pendingOrderingPolicy iteration.

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2015-12-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4479:

Target Version/s: 2.8.0

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2015-12-17 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4479:
---

 Summary: Retrospect app-priority in pendingOrderingPolicy during 
recovering applications
 Key: YARN-4479
 URL: https://issues.apache.org/jira/browse/YARN-4479
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


Currently, same ordering policy is used for pending applications and active 
applications. When priority is configured for an applications, during recovery 
high priority application get activated first. It is possible that low priority 
job was submitted and running state. 
This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-17 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063592#comment-15063592
 ] 

Arun Suresh commented on YARN-4477:
---

Good catch [~Tao Jie] and Thanks for the patch !!
Is it possible to add a test case for this too ?

> FairScheduler: encounter infinite loop in attemptScheduling
> ---
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>Assignee: Tao Jie
> Attachments: YARN-4477.001.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-17 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie reassigned YARN-4477:
-

Assignee: Tao Jie

> FairScheduler: encounter infinite loop in attemptScheduling
> ---
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>Assignee: Tao Jie
> Attachments: YARN-4477.001.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-17 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063577#comment-15063577
 ] 

Tao Jie commented on YARN-4477:
---

In the attached patch, add return value for reserve() to figure out whether 
reservation is set due to limitation. Only when reservation is really set,  
assignContainer would return CONTAINER_RESERVED, otherwise return Resource.none.

> FairScheduler: encounter infinite loop in attemptScheduling
> ---
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
> Attachments: YARN-4477.001.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-17 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063576#comment-15063576
 ] 

sandflee commented on YARN-4138:


{quote}
We should not update lastConfirmedResource in this scenario. This is the exact 
case we want to cover in this ticket, where the resource increase token may 
expire, and we need to roll back to the old resource. The only time we want to 
update lastConfirmedResource during resource increase is when 
Resources.equals(nmContainerResource, rmContainerResource).
{quote}
1,  AM increase resource from 1G -> 2G, get container tokenA
2,  AM increase from 2G -> 4G,get container tokenB
3,  AM send tokenA to NM, and never send tokenB
4,  RM will decease resource to 1G and this is the expeceted?seems we could 
set lastConfirmed resource to 2G

{quote}
Sorry I don't understand the question. Can you elaborate?
{quote}
when I using  requestContainerResourceChange(Container container, Resource 
capability), I have a question about which container we should pass to this 
function.
1, AM request resource and get a container A
2, AM send increase request and get a container B
3, when AM wants to decrease resource, shoud we pass container A or B to 
requestContainerResourceChange?   seems we shoud pass container B, even if 
container B is not send to NM?

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063574#comment-15063574
 ] 

Naganarasimha G R commented on YARN-2934:
-

Hi [~jira.shegalov],
bq. Minor repetition is not big of a deal, IMO.
could please have a look at the latest patch if fine then will not rework, else 
will try to update as per your comment

bq.  Right now we are blindly grabbing file 0. It would however make much more 
sense to grab the most recent (highest mtime) non-empty file.
Are you suggesting for this change in this patch or just mentioning that its 
good to have ? If required can rework as it would be better to go for the 
latest modified file !

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch, YARN-2934.v2.002.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-17 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated YARN-4477:
--
Attachment: (was: YARN-4477.001.patch)

> FairScheduler: encounter infinite loop in attemptScheduling
> ---
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
> Attachments: YARN-4477.001.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-17 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated YARN-4477:
--
Attachment: YARN-4477.001.patch

> FairScheduler: encounter infinite loop in attemptScheduling
> ---
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
> Attachments: YARN-4477.001.patch, YARN-4477.001.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-17 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated YARN-4477:
--
Attachment: YARN-4477.001.patch

> FairScheduler: encounter infinite loop in attemptScheduling
> ---
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
> Attachments: YARN-4477.001.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4476) Matcher for complex node label expresions

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063549#comment-15063549
 ] 

Hadoop QA commented on YARN-4476:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 156 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 0, now 156). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 21s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 7 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 10s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 40s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 8 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 147m 31s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  org.apache.hadoop.yarn.server.resourcemanager.nodelabels.BE defines 
equals and uses Object.hashCode()  At BE.java:Object.hashCode()  At 
BE.java:[lines 24-30] |
|  |  org.apache.hadoop.yarn.server.resourcemanager.nodelabels.BE$And inherits 
equals and uses Object.hashCode()  At BE.ja

[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-17 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063538#comment-15063538
 ] 

Xianyin Xin commented on YARN-4477:
---

cc [~asuresh].

> FairScheduler: encounter infinite loop in attemptScheduling
> ---
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type

2015-12-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063518#comment-15063518
 ] 

Rohith Sharma K S commented on YARN-4164:
-

Looking into the HadoopQA result, 
bq. -1  cc  16m 49s root-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new 
issues (was 16, now 16).
Not caused by this patch
bq. -1  javac   25m 36s root-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new 
issues (was 723, now 723).
Not caused by this patch

And test failures are not related this patch. 
YARN side test case failure JIRA's are tracked by umbrella jira YARN-4478
MapReduce test case failure are tracked by MAPREDUCE-6580 and MAPREDUCE-6579

> Retrospect update ApplicationPriority API return type
> -
>
> Key: YARN-4164
> URL: https://issues.apache.org/jira/browse/YARN-4164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, 
> 0003-YARN-4164.patch, 0004-YARN-4164.patch
>
>
> Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API 
> returns empty UpdateApplicationPriorityResponse response.
> But RM update priority to the cluster.max-priority if the given priority is 
> greater than cluster.max-priority. In this scenarios, need to intimate back 
> to client that updated  priority rather just keeping quite where client 
> assumes that given priority itself is taken.
> During application submission also has same scenario can happen, but I feel 
> when 
> explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), 
> response should have updated priority in response. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2015-12-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4393:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4351) Tests in h.y.c.TestGetGroups get failed on trunk

2015-12-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4351:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> Tests in h.y.c.TestGetGroups get failed on trunk
> 
>
> Key: YARN-4351
> URL: https://issues.apache.org/jira/browse/YARN-4351
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>
> From test report: 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/testReport/, we can 
> see there are several test failures for TestGetGroups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2015-12-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063489#comment-15063489
 ] 

Rohith Sharma K S commented on YARN-4478:
-

Few suggestions on tracking test failures and timed out test cases
# Time out tests might be a bug or may be because of test timeout is so 
aggressive
# If these test case failures are real issue in code, then either we can covert 
the same jira as bug or raise new jira and link to it.

> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-17 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063486#comment-15063486
 ] 

Gera Shegalov commented on YARN-2934:
-

Minor repetition is not big of a deal, IMO. 

The reason I thought of printing file statuses is that you see the file size. 
Which brings us to the following point in the fanciness area. Right now we are 
blindly grabbing file 0. It would however make much more sense to grab the most 
recent (highest mtime) non-empty file.

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch, YARN-2934.v2.002.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-17 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063487#comment-15063487
 ] 

Gera Shegalov commented on YARN-2934:
-

Minor repetition is not big of a deal, IMO. 

The reason I thought of printing file statuses is that you see the file size. 
Which brings us to the following point in the fanciness area. Right now we are 
blindly grabbing file 0. It would however make much more sense to grab the most 
recent (highest mtime) non-empty file.

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch, YARN-2934.v2.002.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2934) Improve handling of container's stderr

2015-12-17 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2934:

Attachment: YARN-2934.v2.002.patch

Hi [~jira.shegalov],
Have incorporated all the comments in the latest patch, can you please take a 
look !

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch, YARN-2934.v2.002.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4318) Test failure: TestAMAuthorization

2015-12-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4318:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0
> Environment: jenkins
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2015-12-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063472#comment-15063472
 ] 

Sunil G commented on YARN-4478:
---

Thanks for taking this initiative. I will add few test failure tickets under 
this which needs attention.

> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4306) Test failure: TestClientRMTokens

2015-12-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4306:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> Test failure: TestClientRMTokens
> 
>
> Key: YARN-4306
> URL: https://issues.apache.org/jira/browse/YARN-4306
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Sunil G
>Assignee: Sunil G
>
> Tests are getting failed in local also. As part of HADOOP-12321 jenkins run, 
> I see same error.:
> {noformat}testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.638 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:363)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:316)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4352:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2015-12-17 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4478:
---

 Summary: [Umbrella] : Track all the Test failures in YARN
 Key: YARN-4478
 URL: https://issues.apache.org/jira/browse/YARN-4478
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Rohith Sharma K S


Recently many test cases are failing either timed out or new bug fix caused 
impact. Many test faiures JIRA are raised and are in progress.
This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-17 Thread Tao Jie (JIRA)
Tao Jie created YARN-4477:
-

 Summary: FairScheduler: encounter infinite loop in 
attemptScheduling
 Key: YARN-4477
 URL: https://issues.apache.org/jira/browse/YARN-4477
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Tao Jie


This problem is introduced by YARN-4270 which add limitation on reservation.  
In FSAppAttempt.reserve():
{code}
if (!reservationExceedsThreshold(node, type)) {
  LOG.info("Making reservation: node=" + node.getNodeName() +
  " app_id=" + getApplicationId());
  if (!alreadyReserved) {
getMetrics().reserveResource(getUser(), container.getResource());
RMContainer rmContainer =
super.reserve(node, priority, null, container);
node.reserveResource(this, priority, rmContainer);
setReservation(node);
  } else {
RMContainer rmContainer = node.getReservedContainer();
super.reserve(node, priority, rmContainer, container);
node.reserveResource(this, priority, rmContainer);
setReservation(node);
  }
}
{code}
If reservation over threshod, current node will not set reservation.
But in attemptScheduling in FairSheduler:
{code}
  while (node.getReservedContainer() == null) {
boolean assignedContainer = false;
if (!queueMgr.getRootQueue().assignContainer(node).equals(
Resources.none())) {
  assignedContainers++;
  assignedContainer = true;
  
}

if (!assignedContainer) { break; }
if (!assignMultiple) { break; }
if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
  }
{code}
assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
equals to Resources.none().
As a result, if multiple assign is enabled and maxAssign is unlimited, this 
while loop would never break.

I suppose that assignContainer(node) should return Resource.none rather than 
CONTAINER_RESERVED when the attempt doesn't take the reservation because of the 
limitation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2015-12-17 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063424#comment-15063424
 ] 

Bibin A Chundatt commented on YARN-4465:


Hi [~leftnoteasy]/[~sunilg]
Thank you for looking into the issue

The below 2 points will fix as part of this jira
#  labels are not found in the cluster thrown exception with message
#  SchedulerUtils.normalizeAndValidateRequest when node label is not enabled 
will reset the labels to default with proper logging.

Thoughts??

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete/list ops

2015-12-17 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-2575:
--
Attachment: YARN-2575.v3.patch

Added the mentioned changes.

> Consider creating separate ACLs for Reservation create/update/delete/list ops
> -
>
> Key: YARN-2575
> URL: https://issues.apache.org/jira/browse/YARN-2575
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sean Po
> Attachments: YARN-2575.v1.patch, YARN-2575.v2.1.patch, 
> YARN-2575.v2.patch, YARN-2575.v3.patch
>
>
> YARN-1051 introduces the ReservationSystem and in the current implementation 
> anyone who can submit applications can also submit reservations. This JIRA is 
> to evaluate creating separate ACLs for Reservation create/update/delete ops.
> Depends on YARN-4340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-17 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063361#comment-15063361
 ] 

Chris Douglas commented on YARN-4195:
-

Posted a draft to YARN-4476

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2015-12-17 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: YARN-4476-0.patch

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4476) Matcher for complex node label expresions

2015-12-17 Thread Chris Douglas (JIRA)
Chris Douglas created YARN-4476:
---

 Summary: Matcher for complex node label expresions
 Key: YARN-4476
 URL: https://issues.apache.org/jira/browse/YARN-4476
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Chris Douglas
Assignee: Chris Douglas


Implementation of a matcher for complex node label expressions based on a 
[paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063345#comment-15063345
 ] 

Hadoop QA commented on YARN-4003:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-4003 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12748114/YARN-4003.patch |
| JIRA Issue | YARN-4003 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10030/console |


This message was automatically generated.



> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is 
> not consistent
> 
>
> Key: YARN-4003
> URL: https://issues.apache.org/jira/browse/YARN-4003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4003.patch
>
>
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a 
> good fit for ReservationQueue (that have highly dynamic capacity). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4198) CapacityScheduler locking / synchronization improvements

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063346#comment-15063346
 ] 

Hadoop QA commented on YARN-4198:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 3s {color} 
| {color:red} YARN-4198 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778405/YARN-4198-v1.patch |
| JIRA Issue | YARN-4198 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10031/console |


This message was automatically generated.



> CapacityScheduler locking / synchronization improvements
> 
>
> Key: YARN-4198
> URL: https://issues.apache.org/jira/browse/YARN-4198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
> Attachments: YARN-4198-v1.patch
>
>
> In the context of YARN-4193 (which stresses the RM/CS performance) we found 
> several performance problems with  in the locking/synchronization of the 
> CapacityScheduler, as well as inconsistencies that do not normally surface 
> (incorrect locking-order of queues protected by CS locks etc). This JIRA 
> proposes several refactoring that improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support node labels

2015-12-17 Thread Alexey Tumanov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063334#comment-15063334
 ] 

Alexey Tumanov commented on YARN-4194:
--

Carlo, Subru, thanks for your review. I'll add documentation.
Carlo, I took a quick look at 
ReservationInputValidator.validateReservationDefinition() . It checks boundary 
conditions on numerical fields of the resDef. For the new String node label 
expression field, the best validation we can do requires parsing code for OR 
expressions, which is in the works. If we decide it's a good idea to parse HRDL 
node label expression at validation time, it would probably be best to 
synchronize that code change with the new OR expression parser.

Thanks,
Alexey

> Extend Reservation Definition Langauge (RDL) extensions to support node labels
> --
>
> Key: YARN-4194
> URL: https://issues.apache.org/jira/browse/YARN-4194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
> Attachments: YARN-4194-v1.patch, YARN-4194-v2.patch
>
>
> This JIRA tracks changes to the APIs to the reservation system to support
> the expressivity of node-labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4198) CapacityScheduler locking / synchronization improvements

2015-12-17 Thread Alexey Tumanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Tumanov updated YARN-4198:
-
Attachment: YARN-4198-v1.patch

Please find attached the patch that relaxes some of the synchronized methods in 
YARN RM. Most of the synchronization relaxation is in handling child queues. 
This code was tested on a large cluster in September 2015 along with the rest 
of the code written for node label aware reservations (YARN-4193). In this 
patch, we isolate just the synchronization changes we've made. We're in the 
process or re-testing this isolated delta and will update this JIRA when the 
testing is complete. Please treat this post as a request for code review and 
comments.

Thanks,
Alexey

> CapacityScheduler locking / synchronization improvements
> 
>
> Key: YARN-4198
> URL: https://issues.apache.org/jira/browse/YARN-4198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
> Attachments: YARN-4198-v1.patch
>
>
> In the context of YARN-4193 (which stresses the RM/CS performance) we found 
> several performance problems with  in the locking/synchronization of the 
> CapacityScheduler, as well as inconsistencies that do not normally surface 
> (incorrect locking-order of queues protected by CS locks etc). This JIRA 
> proposes several refactoring that improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063302#comment-15063302
 ] 

Naganarasimha G R commented on YARN-2934:
-

[~jira.shegalov],
Ok got your intention, Just one suggestion shall i add {{Error file(s): 
[error.log, stderr.1, stderr.2] only when there are multiple files ? if not 
file name just gets repeated twice ?

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4475) Add the possibility to recover the state of an ongoing upgrade in case of RM restart

2015-12-17 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-4475:
--

 Summary: Add the possibility to recover the state of an ongoing 
upgrade in case of RM restart
 Key: YARN-4475
 URL: https://issues.apache.org/jira/browse/YARN-4475
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Giovanni Matteo Fumarola






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4474) Implementation of the application master upgrade logic

2015-12-17 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-4474:
--

 Summary: Implementation of the application master upgrade logic
 Key: YARN-4474
 URL: https://issues.apache.org/jira/browse/YARN-4474
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Giovanni Matteo Fumarola






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4473) Add version information for the application and the application attempts

2015-12-17 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-4473:
--

 Summary: Add version information for the application and the 
application attempts
 Key: YARN-4473
 URL: https://issues.apache.org/jira/browse/YARN-4473
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Giovanni Matteo Fumarola
Assignee: Marco Rabozzi


In order to allow to upgrade an application master across different attempts, 
we need to keep track of different attempts versions and provide a mean to 
temporarily store the upgrade information until the upgrade completes.

Concretely we would add:
- A version identifier for each attempt
- A temporary upgrade context for each application




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4472) Introduce additional states in the app and app attempt state machines to keep track of the upgrade process

2015-12-17 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-4472:
--

 Summary: Introduce additional states in the app and app attempt 
state machines to keep track of the upgrade process
 Key: YARN-4472
 URL: https://issues.apache.org/jira/browse/YARN-4472
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Giovanni Matteo Fumarola
Assignee: Marco Rabozzi






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4471) Add a public API to request an application upgrade

2015-12-17 Thread Marco Rabozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Rabozzi reassigned YARN-4471:
---

Assignee: Marco Rabozzi

> Add a public API to request an application upgrade
> --
>
> Key: YARN-4471
> URL: https://issues.apache.org/jira/browse/YARN-4471
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Marco Rabozzi
>
> This JIRA tracks the definition of a new public API for YARN, which allows 
> users to upgrade AM. This is part of the AM upgrade in-place enhancement 
> proposed in YARN-4470.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4471) Add a public API to request an application upgrade

2015-12-17 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-4471:
--

 Summary: Add a public API to request an application upgrade
 Key: YARN-4471
 URL: https://issues.apache.org/jira/browse/YARN-4471
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Giovanni Matteo Fumarola


This JIRA tracks the definition of a new public API for YARN, which allows 
users to upgrade AM. This is part of the AM upgrade in-place enhancement 
proposed in YARN-4470.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-17 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063282#comment-15063282
 ] 

Chris Douglas commented on YARN-4195:
-

bq. a better version of this, which uses a cool algorithm which skips the 
conversion to DNF

The impl is based on a SIGMOD 2010 
[paper|http://dl.acm.org/citation.cfm?id=1807171] that converts boolean 
expressions to intervals. I'll adapt it for Hadoop and post a patch

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4470) Application Master in-place upgrade

2015-12-17 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-4470:
---
Attachment: AM in-place upgrade design. rev1.pdf

> Application Master in-place upgrade
> ---
>
> Key: YARN-4470
> URL: https://issues.apache.org/jira/browse/YARN-4470
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
> Attachments: AM in-place upgrade design. rev1.pdf
>
>
> It would be nice if clients could ask for an AM in-place upgrade.
> It will give to YARN the possibility to upgrade the AM, without losing the 
> work
> done within its containers. This allows to deploy bug-fixes and new versions 
> of the AM incurring in long service downtimes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4470) Application Master in-place upgrade

2015-12-17 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-4470:
--

 Summary: Application Master in-place upgrade
 Key: YARN-4470
 URL: https://issues.apache.org/jira/browse/YARN-4470
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Giovanni Matteo Fumarola
Assignee: Giovanni Matteo Fumarola


It would be nice if clients could ask for an AM in-place upgrade.
It will give to YARN the possibility to upgrade the AM, without losing the work
done within its containers. This allows to deploy bug-fixes and new versions of 
the AM incurring in long service downtimes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-12-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063270#comment-15063270
 ] 

Sunil G commented on YARN-4293:
---

Thank you [~leftnoteasy] for the review and commit and thank you [~kasha] for 
the review. 

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4293.patch, 0002-YARN-4293.patch, 
> 0003-YARN-4293.patch
>
>
> In order to get resource utilization information easier, "yarn node" CLI 
> should include resource utilization on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2015-12-17 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063268#comment-15063268
 ] 

Carlo Curino commented on YARN-4468:


The attached patch adds documentation for the REST API of YARN-4248. More 
should be done to generally document the reservation system functionality.

> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
> Attachments: YARN-4468.rest-only.patch
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2015-12-17 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4468:
---
Attachment: YARN-4468.rest-only.patch

> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
> Attachments: YARN-4468.rest-only.patch
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063267#comment-15063267
 ] 

Sunil G commented on YARN-3226:
---

Thank you very much [~djp] for the review and commit and thank you 
[~rohithsharma] for the review.!!

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, 0005-YARN-3226.patch, 
> ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete/list ops

2015-12-17 Thread Sean Po (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063254#comment-15063254
 ] 

Sean Po commented on YARN-2575:
---

I'm working on simplifying the patch based on offline comments about 
consolidating the ReservationsACLsManager constructor using an abstract 
interface, and moving the ReservationsACLsManager into the 
AbstractReservationSystem. Also adding reservation checks in submit application 
if a reservatonID is specified.

> Consider creating separate ACLs for Reservation create/update/delete/list ops
> -
>
> Key: YARN-2575
> URL: https://issues.apache.org/jira/browse/YARN-2575
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sean Po
> Attachments: YARN-2575.v1.patch, YARN-2575.v2.1.patch, 
> YARN-2575.v2.patch
>
>
> YARN-1051 introduces the ReservationSystem and in the current implementation 
> anyone who can submit applications can also submit reservations. This JIRA is 
> to evaluate creating separate ACLs for Reservation create/update/delete ops.
> Depends on YARN-4340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-17 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063227#comment-15063227
 ] 

Gera Shegalov commented on YARN-2934:
-

Regarding my comment about the user, I meant the YARN app user. The users don't 
look at the NM logs. They look at the exceptions in the webUI and on the client 
side. if the exception says 

{code}
Container exited with a non-zero exit code 127. Error file(s): [error.log, 
stderr.1, stderr.2]
Last 4096 bytes of error.log :
/bin/bash: /no/jvm/here/bin/java: No such file or directory
{code}

the user will know it should also check those.

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063222#comment-15063222
 ] 

Hadoop QA commented on YARN-4428:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 9m 33s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server-jdk1.7.0_91 
with JDK v1.7.0_91 generated 1 new issues (was 7, now 7). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server (total was 162, now 162). 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 42s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 33s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 19s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
40s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {col

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-17 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063217#comment-15063217
 ] 

Gera Shegalov commented on YARN-2934:
-

the message looks good to me.

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-17 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063209#comment-15063209
 ] 

MENG DING commented on YARN-4138:
-

Thanks [~sandflee] for the review.

bq. use Resources.fitsin(targetResource, lastConfirmedResource)?
Will do

bq. update lastConfirmedResource in RMContainer? and log debug to log info?
We should not update lastConfirmedResource in this scenario. This is the exact 
case we want to cover in this ticket, where the resource increase token may 
expire, and we need to roll back to the old resource. The only time we want to 
update lastConfirmedResource during resource increase is when 
Resources.equals(nmContainerResource, rmContainerResource).

bq. If am increase a containerA 1G -> 2G, and recieved a new container B, and 
have not told NM if am wants to decrease it to 500M, when using 
requestContainerResourceChange(Container container, Resource capability) , 
seems we should use container B?
Sorry I don't understand the question. Can you elaborate?

Thanks,
Meng


> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063197#comment-15063197
 ] 

Naganarasimha G R commented on YARN-2934:
-

[~jira.shegalov] thanks for the comments,

bq. good to see the patch lose 3kb, most of all there are no more changes to 
the common Configuration class.
:)

bq. Still don't see any value in this, please drop: ??if (listStatus.length > 
1) ??
I placed it as i understood your comment ??Not sure how fancy we need to be 
with the case where multiple log files qualify for the pattern, but maybe at 
least mention to the user there are more files to look at.?? in this way. But 
felt its better to have a log than showing it in the diagnostic msg , thoughts ?

bq. We can FileUtil.stat2Paths or add a loop here to extract just the last path 
component.
Going by the propability, in most of the cases there will be only one error 
file matching the pattern, in that case  based on your earlier approach which 
you had suggested will repeat the name of the file twice, which i felt 
redundant hence removed it. Message would look something like :
{code}
Container exited with a non-zero exit code 127. Error file(s): [error.log]
Last 4096 bytes of error.log :
/bin/bash: /no/jvm/here/bin/java: No such file or directory
{code}
Please share your thoughts if required can add it  and for other comments i am 
working on it .



> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete/list ops

2015-12-17 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-2575:
--
Attachment: YARN-2575.v2.1.patch

In YARN-2575.v2.1.patch, findbugs are fixed, but the logic remains the same.

> Consider creating separate ACLs for Reservation create/update/delete/list ops
> -
>
> Key: YARN-2575
> URL: https://issues.apache.org/jira/browse/YARN-2575
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sean Po
> Attachments: YARN-2575.v1.patch, YARN-2575.v2.1.patch, 
> YARN-2575.v2.patch
>
>
> YARN-1051 introduces the ReservationSystem and in the current implementation 
> anyone who can submit applications can also submit reservations. This JIRA is 
> to evaluate creating separate ACLs for Reservation create/update/delete ops.
> Depends on YARN-4340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception

2015-12-17 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063178#comment-15063178
 ] 

Wei-Chiu Chuang commented on YARN-4467:
---

Both test failures look unrelated to this patch

> Shell.checkIsBashSupported swallowed an interrupted exception
> -
>
> Key: YARN-4467
> URL: https://issues.apache.org/jira/browse/YARN-4467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Wei-Chiu Chuang
>  Labels: shell, supportability
> Attachments: HADOOP-12652.001.patch, YARN-4467.001.patch
>
>
> Shell.checkIsBashSupported() creates a bash shell command to verify if the 
> system supports bash. However, its error message is misleading, and the logic 
> should be updated.
> If the shell command throws an IOException, it does not imply the bash did 
> not run successfully. If the shell command process was interrupted, its 
> internal logic throws an InterruptedIOException, which is a subclass of 
> IOException.
> {code:title=Shell.checkIsBashSupported|borderStyle=solid}
> ShellCommandExecutor shexec;
> boolean supported = true;
> try {
>   String[] args = {"bash", "-c", "echo 1000"};
>   shexec = new ShellCommandExecutor(args);
>   shexec.execute();
> } catch (IOException ioe) {
>   LOG.warn("Bash is not supported by the OS", ioe);
>   supported = false;
> }
> {code}
> An example of it appeared in a recent jenkins job
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/
> The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a 
> thread, wait it for 1 second, and interrupt the thread, expecting the thread 
> to terminate. However, the method Shell.checkIsBashSupported swallowed the 
> interrupt, and therefore failed.
> {noformat}
> 2015-12-16 21:31:53,797 WARN  util.Shell 
> (Shell.java:checkIsBashSupported(718)) - Bash is not supported by the OS
> java.io.InterruptedIOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
>   at org.apache.hadoop.util.Shell.run(Shell.java:838)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
>   at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716)
>   at org.apache.hadoop.util.Shell.(Shell.java:705)
>   at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
>   at 
> org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639)
>   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
>   at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646)
>   at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397)
>   at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350)
>   at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330)
>   at 
> org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115)
> Caused by: java.lang.InterruptedException
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:920)
>   ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063175#comment-15063175
 ] 

Hadoop QA commented on YARN-4234:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 67 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 237, now 301). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 4 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 47s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 6s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s 
{color} | {color:green} hadoop-yarn-common in the 

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063172#comment-15063172
 ] 

Naganarasimha G R commented on YARN-4416:
-

Thanks for the review and commit [~wangda] & [~sunilg]

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :
> # queueCapacity, resource-usage has their own read/write lock hence 
> synchronization is not req
> # numContainers is volatile hence synchronization is not req.
> # read/write lock could be added to Ordering Policy. Read operations don't 
> need synchronized. So {{getNumApplications}} doesn't need synchronized. 
> (First 2 will be handled in this jira and the third will be handled in 
> YARN-4443)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4469) yarn application -status should not show a stack trace for an unknown application ID

2015-12-17 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-4469:
--

 Summary: yarn application -status should not show a stack trace 
for an unknown application ID
 Key: YARN-4469
 URL: https://issues.apache.org/jira/browse/YARN-4469
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.7.1
Reporter: Daniel Templeton
Assignee: Daniel Templeton


For example:

{noformat}
# yarn application -status application_1234567890_12345
Exception in thread "main" 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1234567890_12345' doesn't exist in RM.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:324)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:190)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:399)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:429)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:154)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:77)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
 Application with id 'application_1234567890_12345' doesn't exist in RM.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:324)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

at org.apache.hadoop.ipc.Client.call(Client.java:1466)
at org.apache.hadoop.ipc.Client.c

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-17 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063138#comment-15063138
 ] 

Gera Shegalov commented on YARN-2934:
-

Thanks for the latest patch. good to see the patch lose 3kb, most of all there 
are no more changes to the common Configuration class.

One checkstyle issue is the 80-column warning is from the patch around:
{code}
long tailSizeInBytes = conf.getLong(
YarnConfiguration.NM_CONTAINER_ERROR_FILE_TAIL_SIZE_IN_BYTES,

YarnConfiguration.DEFAULT_NM_CONTAINER_ERROR_FILE_TAIL_SIZE_IN_BYTES);
{code}

Those are pretty long names:
Can we do: 
container.stderr.tail.bytes 
NM_CONTAINER_STDERR_BYTES
and the corresponding default. Having stderr in name is also great for users to 
understand what error file is meant in 99% of the cases.
Same thing is for container.stderr.pattern

Still don't see any value in this, please drop:
{code}
if (listStatus.length > 1) {
  LOG.error("Multiple files in " + containerLogDir
  + ", seems to match the error file name pattern configured. "
  + Arrays.toString(listStatus));
}
{code}

Let us not guard the tail read:
{code}
if (fileSize != 0) {
{code}
and there is a value in seeing that the file is empty already on the 
client-side.

Instead of 
{code}
IOUtils.closeStream(errorFileIS) 
{code}

call cleanup so we can pass the logger
{code}
IOUtils.cleanup(LOG, errorFileIS)
{code}

Since the trunk is on JDK7 min:
We can drop the constant UTF_8 and use
in 
{code}
new String(tailBytes, StandardCharsets.UTF_8)
{code}

listStatus as a name for a variable is not intuitive. Maybe use errFileStatus 
for that. 

Obviously I meant tailSizeInBytes, thanks for paying attention. Agree that RLFS 
file status toString might look too ugly.

We can FileUtil.stat2Paths or add a loop here to extract just the last path 
component. 

Also realizing that we should have a low cap on the tail size to prevent a 
misconfiguration to knock out NM with OOM on container failures since we do:
{code}
byte[] tailBytes = new byte[bufferSize];
{code}

One can easily see why I initially confused tailBytes for an int. It should be 
called along the lines {code}tailBuffer{code} 






> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063118#comment-15063118
 ] 

Hudson commented on YARN-3226:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8991 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8991/])
YARN-3226. UI changes for decommissioning node. Contributed by Sunil G. 
(junping_du: rev 1de56b0448d332717c8316c621b4f6af542a85cc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java


> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, 0005-YARN-3226.patch, 
> ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4340) Add "list" API to reservation system

2015-12-17 Thread Sean Po (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063107#comment-15063107
 ] 

Sean Po commented on YARN-4340:
---

I made a comment about the test failures before, but to make it easier to find, 
I'll paste it here again:

This patch addresses the remaining findbugs, checkstyle errors.
The following unit tests are failing and have associated Jira tickets:
hadoop.yarn.server.resourcemanager.TestClientRMTokens 
– YARN-4306
hadoop.yarn.server.resourcemanager.TestAMAuthorization 
– YARN-4318
hadoop.yarn.client.TestGetGroups 
– YARN-4351
The following unit tests are passing locally and flakiness may be related to 
YARN-4352:
org.apache.hadoop.yarn.client.api.impl.TestYarnClient
– testShouldNotRetryForeverForNonNetworkExceptions actually 
fails, but it also fails locally on trunk
– testAMMRToken
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
org.apache.hadoop.yarn.client.api.impl.TestNMClient
The following tests are passing locally:
hadoop.mapreduce.v2.TestMRJobsWithProfiler

There are three new test failures above, but when I ran the tests locally, they 
all passed:
hadoop.mapred.TestJobSysDirWithDFS
hadoop.mapred.TestNetworkedJob
hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler

If you also look at the Hadoop QA comment from two days ago, where the code 
only changed with regards to style, there were very few test failures, which 
indicates that some of the above tests are flaky:

JDK v1.8.0_66 Failed junit tests
hadoop.yarn.server.resourcemanager.TestAMAuthorization
hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
hadoop.yarn.server.resourcemanager.TestClientRMTokens


> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v10.patch, 
> YARN-4340.v2.patch, YARN-4340.v3.patch, YARN-4340.v4.patch, 
> YARN-4340.v5.patch, YARN-4340.v6.patch, YARN-4340.v7.patch, 
> YARN-4340.v8.patch, YARN-4340.v9.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete/list ops

2015-12-17 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-2575:
--
Attachment: YARN-2575.v2.patch

Pre test-patch patch. Removes the logic from queue hierarchy and adds 
reservation acls for Fair scheduler as well.

> Consider creating separate ACLs for Reservation create/update/delete/list ops
> -
>
> Key: YARN-2575
> URL: https://issues.apache.org/jira/browse/YARN-2575
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sean Po
> Attachments: YARN-2575.v1.patch, YARN-2575.v2.patch
>
>
> YARN-1051 introduces the ReservationSystem and in the current implementation 
> anyone who can submit applications can also submit reservations. This JIRA is 
> to evaluate creating separate ACLs for Reservation create/update/delete ops.
> Depends on YARN-4340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support node labels

2015-12-17 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063074#comment-15063074
 ] 

Subru Krishnan commented on YARN-4194:
--

[~atumanov], thanks for the patch. I took a look at the patch & it LGTM. One 
minor comment, can you add Javadocs for the new getter/setter for the node 
label expression in *ReservationRequest* 

> Extend Reservation Definition Langauge (RDL) extensions to support node labels
> --
>
> Key: YARN-4194
> URL: https://issues.apache.org/jira/browse/YARN-4194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
> Attachments: YARN-4194-v1.patch, YARN-4194-v2.patch
>
>
> This JIRA tracks changes to the APIs to the reservation system to support
> the expressivity of node-labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-17 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063066#comment-15063066
 ] 

MENG DING commented on YARN-4138:
-

Hi, [~jianhe]

Thanks for reviewing the code. 

* I think you are right that there could be a race condition where 
rmContainer.getLastConfirmedResource() (called first) is 2G, and the 
rmContainer.getAllocatedResource() (called next) becomes 1G, causing the 
resource delta to become positive.

I think the solution is to synchronize the following block, such that both 
rmContainer.getLastConfirmedResource() and rmContainer.getAllocatedResource() 
will be 1G, so the resource delta is 0, and the decreaseContainer call will be 
skipped.
{code}
SchedContainerChangeRequest decreaseRequest =
new SchedContainerChangeRequest(
schedulerNode, rmContainer,
rmContainer.getLastConfirmedResource());
decreaseContainer(decreaseRequest,
getCurrentAttemptForContainer(containerId));
{code}

* I don't quite understand the concern about the API semantics. If the above is 
fixed, is the API semantics still a concern to you?

* bq. revert format only changes in RMContainerChangeResourceEvent
Will do


> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-12-17 Thread Daniel Zhi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063033#comment-15063033
 ] 

Daniel Zhi commented on YARN-914:
-

Yes. (Another related blog: 
https://aws.amazon.com/blogs/aws/amazon-emr-release-4-1-0-spark-1-5-0-hue-3-7-1-hdfs-encryption-presto-oozie-zeppelin-improved-resizing/)
 

Just clarify my question: my current patch is on top of hadoop 2.7.1. However I 
see branches "trunk", "branch-2.8", "branch-2.7.2" in 
git://git.apache.org/hadoop.git. It would require extra preparation to make a 
patch against these branches and it's unclear to me which branch to prepare the 
patch against.

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063021#comment-15063021
 ] 

Junping Du commented on YARN-3226:
--

+1. Committing patch in.

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, 0005-YARN-3226.patch, 
> ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-12-17 Thread Parvez (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063002#comment-15063002
 ] 

Parvez commented on YARN-914:
-

HI Daniel,

Thank you for the reply. Yes AWS released the latest AMI version that supports 
graceful decommissioning of nodes. I guess you are referring to 
http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-manage-resize.html#graceful-shrink

Sorry don't have much idea about the specific branch.

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-12-17 Thread Parvez (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063000#comment-15063000
 ] 

Parvez commented on YARN-914:
-

HI Daniel,

Thank you for the reply. Yes AWS released the latest AMI version that supports 
graceful decommissioning of nodes. I guess you are referring to 
http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-manage-resize.html#graceful-shrink

Sorry don't have much idea about the specific branch.

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-12-17 Thread Daniel Zhi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062997#comment-15062997
 ] 

Daniel Zhi commented on YARN-914:
-

Maybe you are already aware of this: EMR team has implemented graceful 
decommission in recent AMIs (for example, AMI 3.10.0 or 4.2.0). In these new 
AMIs, when you re-size down the cluster, the control logic will select best 
candidates and graceful decommission them instead of terminate them right away 
as before. You can move on to use AMI 3.10.0 (which is hadoop 2.6.0).

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-12-17 Thread Daniel Zhi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062988#comment-15062988
 ] 

Daniel Zhi commented on YARN-914:
-

AWS EMR (Elastic Map Reduce) implemented graceful decommission of YARN nodes 
and included it in several most recent AMI releases The implementation has been 
verified in thousands of customer clusters. We like to contribute the 
implementation back to Apache hadoop.

Internally we have the code in both hadoop 2.6.0 and hadoop 2.7.1. To prepare 
for release back to Apache hadoop, which branch should we prepare the code 
against?

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062959#comment-15062959
 ] 

Hadoop QA commented on YARN-4304:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 18 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 266, now 274). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 1s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 37s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 25s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL 

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-17 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062956#comment-15062956
 ] 

Xuan Gong commented on YARN-4234:
-

Thanks for the review. [~djp]. Attached a new patch to address all your comments

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-17.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-17 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234.2015-12-17.1.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-17.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062915#comment-15062915
 ] 

Hadoop QA commented on YARN-4257:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 18s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 22s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 137m 57s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778309/YARN-4257.patch |
| JIRA Issue | YARN-4257 |
| Optional Tests |  asf

[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2015-12-17 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4428:
---
Attachment: YARN-4428.2.patch

.2 patch added a unit test for the change in RMAppAttemptImpl. web redirect is 
hard to write a unit test to test for. I test by first run a sleep job with 
invalid options, such as sleep -Dyarn.app.mapreduce.am.command-opts="-abc" -m 
1, then the job will crash during launching. Then I shutdown rm, clear the 
statestore, and bounce the rm back again. Then I visit the crashed app in RM UI 
and verified that I am able to redirected to AHS page for that app.

> Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
> -
>
> Key: YARN-4428
> URL: https://issues.apache.org/jira/browse/YARN-4428
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch, YARN-4428.2.patch
>
>
> When AHS is turned on, if we can't view application in RM page, RM page 
> should redirect us to AHS page. For example, when you go to 
> cluster/app/application_1, if RM no longer remember the application, we will 
> simply get "Failed to read the application application_1", but it will be 
> good for RM ui to smartly try to redirect to AHS ui 
> /applicationhistory/app/application_1 to see if it's there. The redirect 
> usage already exist for logs in nodemanager UI.
> Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on 
> fall back of RM not remembering the app. YARN-3975 tried to do this only when 
> original tracking url is not set. But there are many cases, such as when app 
> failed at launch, original tracking url will be set to point to RM page, so 
> redirect to AHS page won't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062877#comment-15062877
 ] 

Hadoop QA commented on YARN-4360:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 15 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 0, now 15). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 27 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 27s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 10s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 141m 23s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Should 
org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.IterativePlanner$StageProvider
 be a _static_ inner class?  At IterativePlanner.java:inner class?  At 
IterativePlanner.java:[lines 385-423] |
| JDK v1.8.0

[jira] [Commented] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062842#comment-15062842
 ] 

Hadoop QA commented on YARN-4467:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
0s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 43s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 27s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 40s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 1s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 93m 5s {color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
| JDK v1.7.0_91 Failed junit tests | had

[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2015-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062744#comment-15062744
 ] 

Hudson commented on YARN-1856:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8986 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8986/])
YARN-1856. Added cgroups based memory monitoring for containers as (vinodkv: 
rev 4e7d32c0db69882cde854ef581056142a997c005)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/MemoryResourceHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsMemoryResourceHandlerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java
* hadoop-yarn-project/CHANGES.txt


> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Varun Vasudev
> Fix For: 2.9.0
>
> Attachments: YARN-1856.001.patch, YARN-1856.002.patch, 
> YARN-1856.003.patch, YARN-1856.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2015-12-17 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062704#comment-15062704
 ] 

Vinod Kumar Vavilapalli commented on YARN-1856:
---

The style warnings are mostly non-fixable, what with long files and package 
names.

+1 for splitting out the yarn-default.xml changes.

The latest patch looks good to me, +1, checking this in.

> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Varun Vasudev
> Attachments: YARN-1856.001.patch, YARN-1856.002.patch, 
> YARN-1856.003.patch, YARN-1856.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2015-12-17 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062697#comment-15062697
 ] 

Carlo Curino commented on YARN-1051:


YARN-2609 also provides an example of how to invoke this from the Java API. 

> YARN Admission Control/Planner: enhancing the resource allocation model with 
> time.
> --
>
> Key: YARN-1051
> URL: https://issues.apache.org/jira/browse/YARN-1051
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
> YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
> techreport.pdf
>
>
> In this umbrella JIRA we propose to extend the YARN RM to handle time 
> explicitly, allowing users to "reserve" capacity over time. This is an 
> important step towards SLAs, long-running services, workflows, and helps for 
> gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2015-12-17 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062692#comment-15062692
 ] 

Carlo Curino commented on YARN-1051:


[~lars_francke], you can refer to the general tech-report for more of the 
top-level ideas/design. As for how-to-use documentation, you are right it is 
long overdue. A couple of follow up umbrella JIRAs YARN-2573 (HA for the 
reservation system), and YARN-2572 (various improvements/extensions and REST 
API work) can give you some more context of what is brewing. But, in 
particular, as part of the umbrella JIRA YARN-2572, I have just opened 
YARN-4468 that is intended to provide general documentation of the reservaiton 
system and its (recently added) REST api. We will try to get to it soon.  

> YARN Admission Control/Planner: enhancing the resource allocation model with 
> time.
> --
>
> Key: YARN-1051
> URL: https://issues.apache.org/jira/browse/YARN-1051
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
> YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
> techreport.pdf
>
>
> In this umbrella JIRA we propose to extend the YARN RM to handle time 
> explicitly, allowing users to "reserve" capacity over time. This is an 
> important step towards SLAs, long-running services, workflows, and helps for 
> gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2015-12-17 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4468:
--

 Summary: Document the general ReservationSystem functionality, and 
the REST API
 Key: YARN-4468
 URL: https://issues.apache.org/jira/browse/YARN-4468
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


This JIRA tracks effort to document the ReservationSystem functionality, and 
the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2015-12-17 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062667#comment-15062667
 ] 

Wangda Tan commented on YARN-4465:
--

Thanks [~bibinchundatt]/[~sunilg] for looking at this issue.

I agree most of what [~sunilg] said, this is a mis configuration, it's fine to 
report error, but we need to improve message. And change default behavior (like 
ignore default node label expression) could hide mis configurations which 
should be fixed.

How about throw exception with different message when labels are not found in 
the cluster?
{code}
if (null != rmContext) {
  RMNodeLabelsManager nlm = rmContext.getNodeLabelManager();
  if (nlm != null && !nlm.containsNodeLabel(str)) {
return false;
  }
}
{code}

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-17 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062590#comment-15062590
 ] 

Rich Haase commented on YARN-4257:
--

I've uploaded a new patch that removes the failing tests for zero MB containers 
in the FairScheduler.  I don't see a reason why the FairScheduler would need to 
support zero MB containers, but maybe the right thing to do would be to 
override the validateConf in FairScheduler to maintain the existing behavior.  
Any thoughts?   
 

> Move scheduler validateConf method to AbstractYarnScheduler and make it 
> protected
> -
>
> Key: YARN-4257
> URL: https://issues.apache.org/jira/browse/YARN-4257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Swapnil Daingade
>Assignee: Rich Haase
>  Labels: easyfix
> Attachments: YARN-4257.patch
>
>
> Currently FairScheduler, CapacityScheduler and FifoScheduler each have a 
> method private void validateConf(Configuration conf).
> All three methods validate the minimum and maximum scheduler allocations for 
> cpu and memory (with minor difference). FairScheduler supports 0 as minimum 
> allocation for cpu and memory, while CapacityScheduler and FifoScheduler do 
> not. We can move this code to AbstractYarnScheduler (avoids code duplication) 
> and make it protected for individual schedulers to override.
> Why do we care about a minimum allocation of 0 for cpu and memory?
> We contribute to a project called Apache Myriad that run yarn on mesos. 
> Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is 
> launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be 
> run on the NM, a mesos offer for that node is accepted and the NM capacity is 
> dynamically scaled up to match the accepted mesos offer. On completion of the 
> yarn container, resources are returned back to Mesos and the NM capacity is 
> scaled down back to zero (cpu & mem). 
> In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity 
> is at-least as much as yarn.scheduler.minimum-allocation-mb and 
> yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in 
> yarn-site.xml (so a zero capacity NM is possible). However, the validateConf 
> methods in CapacityScheduler and FifoScheduler do not allow for 0 values for 
> these properties (The FairScheduler one does allow for 0). This behaviour 
> should be consistent or at-least be override able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-17 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated YARN-4257:
-
Attachment: YARN-4257.patch

> Move scheduler validateConf method to AbstractYarnScheduler and make it 
> protected
> -
>
> Key: YARN-4257
> URL: https://issues.apache.org/jira/browse/YARN-4257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Swapnil Daingade
>Assignee: Rich Haase
>  Labels: easyfix
> Attachments: YARN-4257.patch
>
>
> Currently FairScheduler, CapacityScheduler and FifoScheduler each have a 
> method private void validateConf(Configuration conf).
> All three methods validate the minimum and maximum scheduler allocations for 
> cpu and memory (with minor difference). FairScheduler supports 0 as minimum 
> allocation for cpu and memory, while CapacityScheduler and FifoScheduler do 
> not. We can move this code to AbstractYarnScheduler (avoids code duplication) 
> and make it protected for individual schedulers to override.
> Why do we care about a minimum allocation of 0 for cpu and memory?
> We contribute to a project called Apache Myriad that run yarn on mesos. 
> Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is 
> launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be 
> run on the NM, a mesos offer for that node is accepted and the NM capacity is 
> dynamically scaled up to match the accepted mesos offer. On completion of the 
> yarn container, resources are returned back to Mesos and the NM capacity is 
> scaled down back to zero (cpu & mem). 
> In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity 
> is at-least as much as yarn.scheduler.minimum-allocation-mb and 
> yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in 
> yarn-site.xml (so a zero capacity NM is possible). However, the validateConf 
> methods in CapacityScheduler and FifoScheduler do not allow for 0 values for 
> these properties (The FairScheduler one does allow for 0). This behaviour 
> should be consistent or at-least be override able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-17 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated YARN-4257:
-
Attachment: (was: YARN-4257.patch)

> Move scheduler validateConf method to AbstractYarnScheduler and make it 
> protected
> -
>
> Key: YARN-4257
> URL: https://issues.apache.org/jira/browse/YARN-4257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Swapnil Daingade
>Assignee: Rich Haase
>  Labels: easyfix
> Attachments: YARN-4257.patch
>
>
> Currently FairScheduler, CapacityScheduler and FifoScheduler each have a 
> method private void validateConf(Configuration conf).
> All three methods validate the minimum and maximum scheduler allocations for 
> cpu and memory (with minor difference). FairScheduler supports 0 as minimum 
> allocation for cpu and memory, while CapacityScheduler and FifoScheduler do 
> not. We can move this code to AbstractYarnScheduler (avoids code duplication) 
> and make it protected for individual schedulers to override.
> Why do we care about a minimum allocation of 0 for cpu and memory?
> We contribute to a project called Apache Myriad that run yarn on mesos. 
> Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is 
> launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be 
> run on the NM, a mesos offer for that node is accepted and the NM capacity is 
> dynamically scaled up to match the accepted mesos offer. On completion of the 
> yarn container, resources are returned back to Mesos and the NM capacity is 
> scaled down back to zero (cpu & mem). 
> In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity 
> is at-least as much as yarn.scheduler.minimum-allocation-mb and 
> yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in 
> yarn-site.xml (so a zero capacity NM is possible). However, the validateConf 
> methods in CapacityScheduler and FifoScheduler do not allow for 0 values for 
> these properties (The FairScheduler one does allow for 0). This behaviour 
> should be consistent or at-least be override able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception

2015-12-17 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-4467:
--
Attachment: YARN-4467.001.patch

Rev02: the original design is not desirable, as it swallowed a potential 
interrupt, causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail. 
Unfortunately, Java does not allow this static method to throw exception. The 
rev02 removed the static member variable, so that the method can throw the 
interrupt exception.

> Shell.checkIsBashSupported swallowed an interrupted exception
> -
>
> Key: YARN-4467
> URL: https://issues.apache.org/jira/browse/YARN-4467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Wei-Chiu Chuang
>  Labels: shell, supportability
> Attachments: HADOOP-12652.001.patch, YARN-4467.001.patch
>
>
> Shell.checkIsBashSupported() creates a bash shell command to verify if the 
> system supports bash. However, its error message is misleading, and the logic 
> should be updated.
> If the shell command throws an IOException, it does not imply the bash did 
> not run successfully. If the shell command process was interrupted, its 
> internal logic throws an InterruptedIOException, which is a subclass of 
> IOException.
> {code:title=Shell.checkIsBashSupported|borderStyle=solid}
> ShellCommandExecutor shexec;
> boolean supported = true;
> try {
>   String[] args = {"bash", "-c", "echo 1000"};
>   shexec = new ShellCommandExecutor(args);
>   shexec.execute();
> } catch (IOException ioe) {
>   LOG.warn("Bash is not supported by the OS", ioe);
>   supported = false;
> }
> {code}
> An example of it appeared in a recent jenkins job
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/
> The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a 
> thread, wait it for 1 second, and interrupt the thread, expecting the thread 
> to terminate. However, the method Shell.checkIsBashSupported swallowed the 
> interrupt, and therefore failed.
> {noformat}
> 2015-12-16 21:31:53,797 WARN  util.Shell 
> (Shell.java:checkIsBashSupported(718)) - Bash is not supported by the OS
> java.io.InterruptedIOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
>   at org.apache.hadoop.util.Shell.run(Shell.java:838)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
>   at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716)
>   at org.apache.hadoop.util.Shell.(Shell.java:705)
>   at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
>   at 
> org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639)
>   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
>   at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646)
>   at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397)
>   at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350)
>   at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330)
>   at 
> org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115)
> Caused by: java.lang.InterruptedException
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:920)
>   ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception

2015-12-17 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang moved HADOOP-12652 to YARN-4467:


   Assignee: (was: Wei-Chiu Chuang)
Component/s: (was: util)
 nodemanager
Key: YARN-4467  (was: HADOOP-12652)
Project: Hadoop YARN  (was: Hadoop Common)

> Shell.checkIsBashSupported swallowed an interrupted exception
> -
>
> Key: YARN-4467
> URL: https://issues.apache.org/jira/browse/YARN-4467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Wei-Chiu Chuang
>  Labels: shell, supportability
> Attachments: HADOOP-12652.001.patch
>
>
> Shell.checkIsBashSupported() creates a bash shell command to verify if the 
> system supports bash. However, its error message is misleading, and the logic 
> should be updated.
> If the shell command throws an IOException, it does not imply the bash did 
> not run successfully. If the shell command process was interrupted, its 
> internal logic throws an InterruptedIOException, which is a subclass of 
> IOException.
> {code:title=Shell.checkIsBashSupported|borderStyle=solid}
> ShellCommandExecutor shexec;
> boolean supported = true;
> try {
>   String[] args = {"bash", "-c", "echo 1000"};
>   shexec = new ShellCommandExecutor(args);
>   shexec.execute();
> } catch (IOException ioe) {
>   LOG.warn("Bash is not supported by the OS", ioe);
>   supported = false;
> }
> {code}
> An example of it appeared in a recent jenkins job
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/
> The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a 
> thread, wait it for 1 second, and interrupt the thread, expecting the thread 
> to terminate. However, the method Shell.checkIsBashSupported swallowed the 
> interrupt, and therefore failed.
> {noformat}
> 2015-12-16 21:31:53,797 WARN  util.Shell 
> (Shell.java:checkIsBashSupported(718)) - Bash is not supported by the OS
> java.io.InterruptedIOException: java.lang.InterruptedException
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
>   at org.apache.hadoop.util.Shell.run(Shell.java:838)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
>   at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716)
>   at org.apache.hadoop.util.Shell.(Shell.java:705)
>   at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
>   at 
> org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639)
>   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
>   at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646)
>   at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397)
>   at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350)
>   at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330)
>   at 
> org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115)
> Caused by: java.lang.InterruptedException
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:920)
>   ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-17 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062558#comment-15062558
 ] 

Rich Haase commented on YARN-4257:
--

Reviewing failed tests.

> Move scheduler validateConf method to AbstractYarnScheduler and make it 
> protected
> -
>
> Key: YARN-4257
> URL: https://issues.apache.org/jira/browse/YARN-4257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Swapnil Daingade
>Assignee: Rich Haase
>  Labels: easyfix
> Attachments: YARN-4257.patch
>
>
> Currently FairScheduler, CapacityScheduler and FifoScheduler each have a 
> method private void validateConf(Configuration conf).
> All three methods validate the minimum and maximum scheduler allocations for 
> cpu and memory (with minor difference). FairScheduler supports 0 as minimum 
> allocation for cpu and memory, while CapacityScheduler and FifoScheduler do 
> not. We can move this code to AbstractYarnScheduler (avoids code duplication) 
> and make it protected for individual schedulers to override.
> Why do we care about a minimum allocation of 0 for cpu and memory?
> We contribute to a project called Apache Myriad that run yarn on mesos. 
> Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is 
> launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be 
> run on the NM, a mesos offer for that node is accepted and the NM capacity is 
> dynamically scaled up to match the accepted mesos offer. On completion of the 
> yarn container, resources are returned back to Mesos and the NM capacity is 
> scaled down back to zero (cpu & mem). 
> In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity 
> is at-least as much as yarn.scheduler.minimum-allocation-mb and 
> yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in 
> yarn-site.xml (so a zero capacity NM is possible). However, the validateConf 
> methods in CapacityScheduler and FifoScheduler do not allow for 0 values for 
> these properties (The FairScheduler one does allow for 0). This behaviour 
> should be consistent or at-least be override able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062545#comment-15062545
 ] 

Hadoop QA commented on YARN-4257:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 55s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 42s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 47s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/1

[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios

2015-12-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062542#comment-15062542
 ] 

Naganarasimha G R commented on YARN-4350:
-

Hi [~sjlee0] & [~varun_saxena],
Same test case failed again in java 1.8
{code}
java.lang.AssertionError: expected:<2> but was:<3>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV1(TestDistributedShell.java:356)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:317)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:195)
{code}
from the logs was also able to make out that there are 4 containers in total 
which are getting launched but it should be three (1 AM & 2 Containers) but 
seems like 3 containers are getting assigned and AM is launching 3 containers 
with the command because of which 3 events are published. Seems to be some 
scheduling issue, can we do this in another jira in the trunk (like YARN-4385) 
and get this patch to completion ?

> TestDistributedShell fails for V2 scenarios
> ---
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch, 
> YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-17 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4304:
--
Attachment: 0005-YARN-4304.patch

Test case failures were related. Uploading a new patch addressing the same.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062493#comment-15062493
 ] 

Hadoop QA commented on YARN-4350:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
23s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 51s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 6s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 12s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK v1.8.0_66 
generated 1 new issues (was 15, now 15). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 7m 57s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_91 with JDK v1.7.0_91 
generated 1 new issues (was 15, now 15). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 24s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 57s {color} 
| {color:red} hadoop-yarn-applications-distributedshell in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 31s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 43s 
{color} | {color:green} hadoop-yarn-applications-distributedshell in the patch 
passed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {c

[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-12-17 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062464#comment-15062464
 ] 

Varun Saxena commented on YARN-3816:


Just to elaborate further, the main thing is that if we keep on adding prefixes 
or postfixes to column qualifier, there might be  several HBase filters 
required for each column qualifier match. There is the problem of identifying 
whether metric is TIME_SERIES vs SINGLE_VALUE as well. Even in that case, a 
column prefix/postfix may be required. So thought was will it be feasible to 
indicate whether to aggregate or not without adding a postfix in column 
qualifier(via config, say)..

We can however revisit this later as well for the sake of this JIRA going 
through quick.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, 
> YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4385) TestDistributedShell times out

2015-12-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062444#comment-15062444
 ] 

Naganarasimha G R commented on YARN-4385:
-

Hi [~ozawa], 
Please confirm if this is reproducible if not planning to disable it !

> TestDistributedShell times out
> --
>
> Key: YARN-4385
> URL: https://issues.apache.org/jira/browse/YARN-4385
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Tsuyoshi Ozawa
>Assignee: Naganarasimha G R
> Attachments: 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4463) Container launch failure when yarn.nodemanager.log-dirs directory path contains space

2015-12-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062461#comment-15062461
 ] 

Sunil G commented on YARN-4463:
---

I am not very sure how you are planning to  fix  this. {{%20}} is an option 
correct? like URL.

> Container launch failure when yarn.nodemanager.log-dirs directory path 
> contains space
> -
>
> Key: YARN-4463
> URL: https://issues.apache.org/jira/browse/YARN-4463
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> If the container log directory path contains space container-launch fails
> Even with DEBUG logs are enabled only log able to get is 
> {noformat}
> Container id: container_e32_1450233925719_0009_01_22
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:912)
> at org.apache.hadoop.util.Shell.run(Shell.java:823)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1102)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We can support container-launch to support nmlog directory path with space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2015-12-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062451#comment-15062451
 ] 

Sunil G commented on YARN-4465:
---

[~bibinchundatt] Agree that message is not clear here. However this is a 
negative scenario where feature is disabled from top level, and few 
configuration is made. I feel we could reset such expression in resource 
request(such as explicit label from request OR default expression in queue) to 
a common agreed default label. This is harmless, however we are changing some 
default behavior hence proper logging is needed to inform user/admin that a 
default behavior is took place. [~leftnoteasy] thoughts?

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >