[jira] [Updated] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-01-14 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3998:
---
Attachment: YARN-3998.04.patch

Sorry, there is something wrong with YARN-3998.03.patch. Attached a new patch 
YARN-3998.04.patch.

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4559) Make leader elector and zk store share the same curator client

2016-01-14 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4559:
--
Attachment: YARN-4559.3.patch

> Make leader elector and zk store share the same curator client
> --
>
> Key: YARN-4559
> URL: https://issues.apache.org/jira/browse/YARN-4559
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4559.1.patch, YARN-4559.2.patch, YARN-4559.3.patch
>
>
> After YARN-4438, we can reuse the same curator client for leader elector and 
> zk store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-01-14 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101383#comment-15101383
 ] 

Jun Gong commented on YARN-3998:


[~vvasudev] Thanks for the detailed review and suggestions! I just attached a 
new patch to address above problems.

{quote}
In your implementation, the relaunched container will go through the 
launchContainer call which will try to setup the container launch environment 
again(like creating the local and log dirs, creating tokens, etc). Won't this 
lead to FileAlreadyExistsException being thrown as part of the launchContainer 
call? In addition, this also means that on a node with more than one local dir, 
different attempts could get allocated to different local dirs. I wonder if 
it's better to move the retry logic into the launchContainer function instead 
of adding a new state transition?
{quote}
The reason for adding a new state transition are as following:
1. Between retry interval, container is not running actually, it seems more 
reasonable to make it in LOCALIZED state.
2. For NM restart, it will not be enough to just add retry logic into 
*ContainerLaunch#call()*. When NM restart, it will call 
*RecoveredContainerLaunch#call*, then we also need add retry logic at this 
place, otherwise container might exit with failure with no retry. The logic 
seems more clear to add a state transition, and avoids duplicated codes.

In order to avoid FileAlreadyExistsException, I add some 
code(*cleanupContainerFilesForRelaunch*) to cleanup files(token file and launch 
script). We also need to cleanup previous PID file, NM will try to get PID 
through this file when NM restart.

In order to use same container working directory and log directory, we need to 
record these path, and need to store these path to NMStateStore for NM restart 
case. According to [~vvasudev]'s suggestion, we use a simple heuristic – if a 
good work directory with the container tokens file already exists, use that 
directory otherwise use a new one. That way we don’t need to worry about 
storing the directories in the state store. However there is not a file likes 
'tokens file' for log directory, so we use the file 'stdout' as this kind of 
file. We assume there is 'stdout' in most containers' log direcotry.

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-01-14 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3998:
---
Attachment: YARN-3998.03.patch

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4598) Invalid event: RESOURCE_FAILED at CONTAINER_CLEANEDUP_AFTER_KILL

2016-01-14 Thread tangshangwen (JIRA)
tangshangwen created YARN-4598:
--

 Summary: Invalid event: RESOURCE_FAILED at 
CONTAINER_CLEANEDUP_AFTER_KILL
 Key: YARN-4598
 URL: https://issues.apache.org/jira/browse/YARN-4598
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: tangshangwen
Assignee: tangshangwen


In our cluster, I found that the container has some problems in state 
transition,this is my log
{noformat}
2016-01-12 17:42:50,088 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1452588902899_0001_01_87 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2016-01-12 17:42:50,088 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Can't handle this event at current state: Current: 
[CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [RESOURCE_FAILED]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
RESOURCE_FAILED at CONTAINER_CLEANEDUP_AFTER_KILL   

at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1127)
   
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:83)
 
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1078)
  
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1071)
  
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) 
 
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
 
at java.lang.Thread.run(Thread.java:744)


2016-01-12 17:42:50,089 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1452588902899_0001_01_94 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to null
2016-01-12 17:42:50,089 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop   
OPERATION=Container Finished - Killed   TARGET=ContainerImpl
RESULT=SUCCESS  APPID=application_1452588902899_0001
CONTAINERID=container_1452588902899_0001_01_94  

2016-01-12 17:42:50,089 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1452588902899_0001_01_94 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to DONE 
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4502) Sometimes Two AM containers get launched

2016-01-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101256#comment-15101256
 ] 

Wangda Tan commented on YARN-4502:
--

Looks good to me, +1. Pending Jenkins.

> Sometimes Two AM containers get launched
> 
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
>     Attachments: YARN-4502-20160114.txt, YARN-4502-20160212.txt
>
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4589) Diagnostics for localization timeouts is lacking

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101240#comment-15101240
 ] 

Hadoop QA commented on YARN-4589:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 19s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
42s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 6m 42s 
{color} | {color:red} root in trunk failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 2m 23s 
{color} | {color:red} root in trunk failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 58s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 44s 
{color} | {color:red} root in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 44s {color} | 
{color:red} root in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 44s {color} 
| {color:red} root in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 50s 
{color} | {color:red} root in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 50s {color} | 
{color:red} root in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 50s {color} 
| {color:red} root in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 20s 
{color} | {color:red} Patch generated 4 new checkstyle issues in root (total 
was 728, now 730). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 4s 
{color} | {color:red} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
introduced 1 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 44s 
{color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-mapreduce-client-app in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 9m 45s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_91 with JDK v1.7.0_91 
generated 1 new issues (was 0, now 1). {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 9m 45s 
{color} | {color:red} 
hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.7.0_91
 with JDK v1.7.0_91 generated 1 new issues (was 0, now 1).

[jira] [Updated] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-14 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4538:
---
Attachment: 0005-YARN-4538.patch

[~leftnoteasy]
Thanks for review .
Sorry to miss javadoc. Patch uploaded correcting javadoc.
Please do review.

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4538.patch, 0002-YARN-4538.patch, 
> 0003-YARN-4538.patch, 0004-YARN-4538.patch, 0005-YARN-4538.patch
>
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts

2016-01-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101222#comment-15101222
 ] 

Rohith Sharma K S commented on YARN-4584:
-

I had offline discussion with [~jianhe] regarding this issue. And fix summary 
is as follows.
# Do not remove the attempts if attemptFailuresValidityInterval <=0.
# Remove the excess attempts which are only beyond 
attemptFailuresValidityInterval.

> RM startup failure when AM attempts greater than max-attempts
> -
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4584.patch
>
>
> Configure 3 queue in cluster with 8 GB
> # queue 40%
> # queue 50% 
> # default 10%
> * Submit applications to all 3 queue with container size as 1024MB (sleep job 
> with 50 containers on all queues)
> * AM that gets assigned to default queue and gets preempted immediately after 
> 20 preemption kill all application
> Due resource limit in default queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



-

[jira] [Updated] (YARN-4502) Sometimes Two AM containers get launched

2016-01-14 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4502:
--
Attachment: YARN-4502-20160114.txt

Tx for the review, [~leftnoteasy]

bq. Do you think is it better to rename PREEMPT_CONTAINER/preemptContainer in 
CapacityScheduler to something like add_preemption_candidate or 
mark-container-for-preemption? Preempt-container looks very similar to 
kill-preempted-container. Similarily, FiCaSchedulerApp#preemptContainer.
Makes sense, that is what I was thinking before too but didn't make the change.

Uploading a new patch with that comment addressed. Also fixed whitespace and 
checkstyle issue. Cannot address the too-long-file warning.

> Sometimes Two AM containers get launched
> 
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
>     Attachments: YARN-4502-20160114.txt, YARN-4502-20160212.txt
>
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-01-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101166#comment-15101166
 ] 

Wangda Tan commented on YARN-4108:
--

Thanks for looking at this, [~eepayne].

bq. In the lazy preemption case, PCPP will send an event to the scheduler to 
mark a container killable. Can PCPP check if it's already been marked before 
sending, so that maybe event traffic will be less in the RM?
Agree, we can create a killable map similar to preempted-map in PCPP

bq. Currently, if both queueA and queueB are over their guaranteed capacity, 
preemption will still occur if queueA is more over capacity than queueB. I 
think it is probably important to preserve this behavior (YARN-2592).
Thank for pointing me this patch, quick read comments on YARN-2592. I think we 
can still keep the same behavior in the new proposal: currently I assume only 
queue with usage less than guranteed can preempt containers from others, but we 
can relax this limit to: queue doesn't have to-be-preempted containers could 
preempt from others.
However, I think allowing two over-satisfied queues shooting at each other may 
not reasonable, if we have 3 queues configured to, a=10, b=20, c=70. when c 
uses nothing, we cannot simply interpret a's new capacity = 33 and b's new 
capacity = 66. (a:b = 10:20). Since admin only configured capacities of a/b to 
10/20, we should strictly follow what admin configured.

bq. don't see anyplace where ResourceLimits#isAllowPreemption is called. But, 
if it is, Will the following code in LeafQueue change preemption behavior?...
Yes, LeafQueue decides an app could kill containers or not. And app will use it 
in 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#assignContainer}}
 for deciding {{toKillContainers}}.

bq. I'm just trying to understand how things will be affected when headroom for 
a parent queue is (limit - used) + killable. Doesn't that say that a parent 
queue has more headroom than it's already acutally using? Is it relying on this 
behavior so that the assignment code will determine that it has more headroom 
when there are killable containers, and then rely on the leafqueue to kill 
those containers?
I'm not sure if I understand your question properly, let me trying to explain 
this behavior: 
ParentQueue will add its own killable container to headroom 
(getTotalKillableResource is a bad naming, it should be 
{{getTotalKillableResourceForThisQueue}}). Since these containers are all 
belongs to the parent queue, it has rights to kill all of them to satisfy 
max-queue-capacity.
Killable container will be actually killed in two cases:
- An under-satisfied leaf queue trying to allocate on a node, but the node 
doesn't have enough resources, so it will kill containers *on the node* to 
allocate the new container
- A queue who is using more than max-capacity, an it has killable container, we 
will try to kill containers for such queues to make sure it doesn't violate 
max-capacity. You can check following code in ParentQueue#allocateResource:
{code}
// check if we need to kill (killable) containers if maximum resource 
violated.
if (getQueueCapacities().getAbsoluteMaximumCapacity(nodePartition)
< getQueueCapacities().getAbsoluteUsedCapacity(nodePartition)) {
  killContainersToEnforceMaxQueueCapacity(nodePartition, clusterResource);
}
{code} 
 
bq. NPE if getChildQueues() returns null
Nice catching, updated locally

bq. CSAssignment#toKillContainers: I would call them containersToKill
Agree, updated locally 

bq. It would be interesting to know what your thoughts are on making further 
modifications to PCPP to make more informed choices about which containers to 
kill.
I don't have clear ideas for this, a rough idea in my mind is, we could adding 
some field to scheduler to indicate some special request (e.g. 
large/hard-locality, etc.) is starving and head-of-line (HOL). And doing scan 
in PCPP at background, after PCPP marks container-to-be-preempted, we can 
leverage marked starving-and-HOL request to modify existing marked 
to-be-preempted containers.
Again, this is a rough thinking, I'm not sure if it is doable.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, 
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch
>
>
> This is sibling JIRA for YARN-

[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101162#comment-15101162
 ] 

Hadoop QA commented on YARN-4559:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 108, now 112). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 45s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 40s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 140m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.resourceManager;
 locked 60% of time  Unsynchronized access at RMStateStore.java:60% of time  
Unsynchronized access at RMStateStore.java:[line 1183] |
| JDK v1.8.0_66 F

[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101151#comment-15101151
 ] 

Hadoop QA commented on YARN-4428:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 36s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91
 with JDK v1.7.0_91 generated 1 new issues (was 2, now 2). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 7 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 132, now 138). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 8s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 10s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 138m 39s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed 

[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-01-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101149#comment-15101149
 ] 

Chris Douglas commented on YARN-4597:
-

The {{ContainerLaunchContext}} (CLC) specifies the prerequisites for starting a 
container on a node. These include setting up user/application directories and 
downloading dependencies to the NM cache (localization). The NM assumes that an 
authenticated {{startContainer}} request has not overbooked resources on the 
node, so resources are only reserved/enforced during the container launch and 
execution.

This JIRA proposes to add a phase between localization and container launch to 
manage a collection of runnable containers. Similar to the localizer stage, a 
container will launch only after all the resources from its CLC are assigned by 
a _local scheduler_. The local scheduler will select containers to run based on 
priority, declared requirements, and by monitoring utilization on the node 
(YARN-1011).

A few future and in-progress features motiviate this change.

*Preemption* Instead of sending a kill when the RM selects a victim container, 
it could instead convert it from a {{GUARANTEED}} to an {{OPTIMISTIC}} 
container (YARN-4335). This has two benefits. First, the downgraded container 
can continue to run until a guaranteed container arrives _and_ finishes 
localizing its dependencies, so the downgraded container has an opportunity to 
complete or checkpoint. When the guaranteed container moves from {{LOCALIZED}} 
to {{SCHEDULING}}, the local scheduler may select the victim (formerly 
guaranteed) container to be killed. \[1\] Second, the NM may elect to kill the 
victim container to run _different_ optimistic containers, particularly 
short-running tasks.

*Optimistic scheduling and overprovisioning* To support distributed scheduling 
(YARN-2877) and resource-aware scheduling (YARN-1011), the NM needs a component 
to select containers that are ready to run. The local scheduler can not only 
select tasks to run based on monitoring, it can also make offers to running 
containers using durations attached to leases \[2\]. Based on recent 
observations, it may start containers that oversubscribe the node, or delay 
starting containers if a lease is close to expiring (i.e., the container is 
likely to complete).

*Long-running services*. Note that by separating the local scheduler, both that 
module _and_ the localizer could be opened up as services provided by the NM. 
The localizer could also be extended to prioritize downloads among 
{{OPTIMISTIC}} containers (possibly preemptable by {{GUARANTEED}}, and to group 
containers based on their dependencies (e.g., avoid downloading a large dep for 
fewer than N optimistic containers). By exposing these services, the NM can 
assist with the following:

# Resource spikes. If a service container needs to spike temporarily, it may 
not need guaranteed resources (YARN-1197). Containers requiring low-latency 
elasticity could request optimistic resources instead of peak provisioning, 
resizing, or using workarounds like [Llama|http://cloudera.github.io/llama/]. 
If the local scheduler is addressable by local containers, then the lease could 
be logical (i.e., not start a process). Resources assigned to a {{RUNNING}} 
container could be published rather than triggering a launch. One could also 
imagine service workers marking some resources as unused, while retaining the 
authority to spike into them ("subleasing" them to opportunistic containers) by 
reclaiming them through the local scheduler.
# Upgrades. If the container needs to pull new dependencies, it could use the 
NM Localizer rather of coordinating the download itself.
# Maintenance tasks. Services often need to clean up, compact, scrub, and 
checkpoint local data. Right now, each service needs to independnetly monitor 
resource utilization to back off saturated resources (particularly disks). 
Coordination between services is difficult. In contrast, one could schedule 
tasks like block scrubbing as optimistic tasks in the NM to avoid interrupting 
services that are spiking. This is similar in spirit to distributed scheduling 
insofar as it does not involve the RM and targets a single host (i.e., the host 
the container is running on).

\[1\] Though it was selected as a victim by the RM, the local scheduler may 
decide to kill a different {{OPTIMISTIC}} container when the guaranteed 
container requests resources. For example, if a container completes on the node 
after the RM selected the victim, then the NM may elect to kill a smaller 
optimistic process if it is sufficient to satisfy the guarantee.
\[2\] Discussion on duration in YARN-1039 was part of a broader conversation on 
support for long-running services (YARN-896).


> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
>   

[jira] [Created] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-01-14 Thread Chris Douglas (JIRA)
Chris Douglas created YARN-4597:
---

 Summary: Add SCHEDULE to NM container lifecycle
 Key: YARN-4597
 URL: https://issues.apache.org/jira/browse/YARN-4597
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Douglas


Currently, the NM immediately launches containers after resource localization. 
Several features could be more cleanly implemented if the NM included a 
separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-01-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101143#comment-15101143
 ] 

Wangda Tan commented on YARN-4108:
--

Hi [~sunilg]

bq. 1. In current PCPP, we first preempt all reserved continers from all 
applications in a queue if its overallocated. 
I would prefer to following the existing PCPP logic: drop the container 
reservation first and let scheduler continue to decide who can allocate/reserve 
container on it.

bq. 2. If I understood correctly, "killable containers" will be triggered with 
preeempt event only if a proper allocation can happen for target application 
(from underserving queue).
For the preempt event sent to AM, my current thinking is, we will send it when 
we add it to preempt list (not killable). AM could save state or return 
resources. I understand this could lead to unnecessary 
add-container-to-preempt-list event send to AM, but I think it's better than 
excessive killing containers.

bq. 3. To cancel "killable container", i think PCPP will take the call by 
waiting for some interval. So some new configuration is needed for this?
Maybe not, we don't need extra config for cancel killable container

bq. 4. I would like to have some freedom in selecting conatiners (marking) for 
preemption. A simple sorting based on submission time or priority seems limited 
approach. Could we have some interface here so that we can plugin user specific 
comparision cases.
Agree, we can discuss this in a separated JIRA if we agree with the overall 
approach (separate container selection and preemption). Will file a new JIRA 
when we reach a consensus about the roadmap. [~mding]


> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, 
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2016-01-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101117#comment-15101117
 ] 

Wangda Tan commented on YARN-3215:
--

Thanks [~sunilg]/[~Naganarasimha].

bq. min (total unused resourcelimit for a given label,  ), so that headroom doesn't exceed whats actually available! right 
?

Sounds like a good plan to me :)

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3215.v1.001.patch
>
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101115#comment-15101115
 ] 

Wangda Tan commented on YARN-4538:
--

+1 to latest patch, [~bibinchundatt], could you take a look at [javadocs 
warning|https://builds.apache.org/job/PreCommit-YARN-Build/10279/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt]?

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4538.patch, 0002-YARN-4538.patch, 
> 0003-YARN-4538.patch, 0004-YARN-4538.patch
>
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4512) Provide a knob to turn on over-allocation

2016-01-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101110#comment-15101110
 ] 

Wangda Tan commented on YARN-4512:
--

Thanks [~elgoiri], I don't have strong opinion regarding to use 
ResourceOption/node-heartbeat to update thresholds. 

> Provide a knob to turn on over-allocation
> -
>
> Key: YARN-4512
> URL: https://issues.apache.org/jira/browse/YARN-4512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: YARN-4512-YARN-1011.001.patch, 
> yarn-4512-yarn-1011.002.patch, yarn-4512-yarn-1011.003.patch
>
>
> We need two configs for overallocation - one to specify the threshold upto 
> which it is okay to over-allocate, another to specify the threshold after 
> which OPPORTUNISTIC containers should be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4596) SystemMetricPublisher should not swallow error messages from TimelineClient#putEntities

2016-01-14 Thread Li Lu (JIRA)
Li Lu created YARN-4596:
---

 Summary: SystemMetricPublisher should not swallow error messages 
from TimelineClient#putEntities
 Key: YARN-4596
 URL: https://issues.apache.org/jira/browse/YARN-4596
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu


We should report error messages from the returned TimelineResponse when posting 
timeline entities through system metric publisher. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4565) When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only

2016-01-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101108#comment-15101108
 ] 

Wangda Tan commented on YARN-4565:
--

Thanks [~Naganarasimha]!

> When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, 
> Sometimes lead to situation where all queue resources consumed by AMs only
> 
>
> Key: YARN-4565
> URL: https://issues.apache.org/jira/browse/YARN-4565
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
> Attachments: YARN-4565.1.patch, YARN-4565.2.patch, YARN-4565.3.patch
>
>
> When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, 
> Sometimes lead to situation where all queue resources consumed by AMs only,
> So from users perpective it appears that all application in queue are stuck, 
> whole queue capacity is comsumed by AMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101106#comment-15101106
 ] 

Wangda Tan commented on YARN-1011:
--

bq. Welcome any thoughts/suggestions on handling promotion if we allow 
applications to ask for only guaranteed containers. I ll continue 
brain-storming. We want to have a simple mechanism, if possible; complex 
protocols seem to find a way to hoard bugs.
Agree :)

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf, 
> yarn-1011-design-v2.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4496) Improve HA ResourceManager Failover detection on the client

2016-01-14 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-4496:
-

Assignee: Jian He  (was: Arun Suresh)

> Improve HA ResourceManager Failover detection on the client
> ---
>
> Key: YARN-4496
> URL: https://issues.apache.org/jira/browse/YARN-4496
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, resourcemanager
>Reporter: Arun Suresh
>Assignee: Jian He
>
> HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to 
> improve Namenode failover detection in the client. It does this by 
> concurrently trying all namenodes and picks the namenode that returns the 
> fastest with a successful response as the active node.
> It would be useful to have a similar ProxyProvider for the Yarn RM (it can 
> possibly be done by converging some the class hierarchies to use the same 
> ProxyProvider)
> This would especially be useful for large YARN deployments with multiple 
> standby RMs where clients will be able to pick the active RM without having 
> to traverse a list of configured RMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4496) Improve HA ResourceManager Failover detection on the client

2016-01-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101053#comment-15101053
 ] 

Jian He commented on YARN-4496:
---

taking over, thanks !

> Improve HA ResourceManager Failover detection on the client
> ---
>
> Key: YARN-4496
> URL: https://issues.apache.org/jira/browse/YARN-4496
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, resourcemanager
>Reporter: Arun Suresh
>Assignee: Jian He
>
> HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to 
> improve Namenode failover detection in the client. It does this by 
> concurrently trying all namenodes and picks the namenode that returns the 
> fastest with a successful response as the active node.
> It would be useful to have a similar ProxyProvider for the Yarn RM (it can 
> possibly be done by converging some the class hierarchies to use the same 
> ProxyProvider)
> This would especially be useful for large YARN deployments with multiple 
> standby RMs where clients will be able to pick the active RM without having 
> to traverse a list of configured RMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts

2016-01-14 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099209#comment-15099209
 ] 

Bibin A Chundatt commented on YARN-4584:


[~jianhe]
The AM is getting pre-empted and exceeds the max limit in my case.

> RM startup failure when AM attempts greater than max-attempts
> -
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4584.patch
>
>
> Configure 3 queue in cluster with 8 GB
> # queue 40%
> # queue 50% 
> # default 10%
> * Submit applications to all 3 queue with container size as 1024MB (sleep job 
> with 50 containers on all queues)
> * AM that gets assigned to default queue and gets preempted immediately after 
> 20 preemption kill all application
> Due resource limit in default queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099207#comment-15099207
 ] 

Hadoop QA commented on YARN-4428:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 39s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91
 with JDK v1.7.0_91 generated 1 new issues (was 2, now 2). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 7 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 132, now 138). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 47s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 8s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 140m 23s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yar

[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099204#comment-15099204
 ] 

Hadoop QA commented on YARN-4311:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
3s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 40s 
{color} | {color:red} root in trunk failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 11s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
2s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 8m 17s {color} 
| {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 622 new issues (was 
111, now 733). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 3s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 43s 
{color} | {color:red} root in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 43s {color} 
| {color:red} root in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 2s 
{color} | {color:red} Patch generated 2 new checkstyle issues in root (total 
was 397, now 398). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 20s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:gre

[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-01-14 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099191#comment-15099191
 ] 

Sangjin Lee commented on YARN-4577:
---

{quote}
The use case here is simple: if we specify the aux-services classpath, either 
from local fs or from hdfs, we will load this service from the specified 
classpath (no matter we set the classpath in NM path or not). Otherwise, we 
load the service from the NM path.
{quote}

Hmm, is one of the goals to preserve aux service's dependencies against 
hadoop's dependencies (as I see in the linked ticket SPARK-12807)? If so, I 
don't think the current approach in the patch does that. Note that 
URLClassLoader (or any simple extension of ClassLoader) always *delegates 
classloading to the parent classloader first*, and loads the class *only if* 
the parent classloader doesn't load/have it. In other words, any classpath the 
URLClassLoader owns is effectively *appended*, not prepended. That's precisely 
why ApplicationClassLoader inverts that order to create isolation.

Could you write a simple test program to verify this behavior? I'm pretty sure 
you'll find that your classpath will still be shadowed by the system classpath.

Also, as for using the ApplicationClassLoader, it shouldn't be too difficult. 
You pass in {{URL[]}} to the URLClassLoader too, so that's common. You can 
simply pass in the classloader of the calling class as the parent classloader. 
Also, you can simply pass null for the system classes, in which case the 
sensible default will be used.

If it helps anyway, we could introduce a simpler constructor like the following:
{code}
public ApplicationClassLoader(URL[] classpath) {
  this(classpath, getClass().getClassLoader(), null);
}
{code}

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4594) Fix test-container-executor.c to pass

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099190#comment-15099190
 ] 

Hadoop QA commented on YARN-4594:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 4s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 15s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m 48s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782374/YARN-4594.001.patch |
| JIRA Issue | YARN-4594 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux e66c5cac93b8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / cdf8895 |
| Default Java | 1.7.0_91 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_66 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 |
| JDK v1.7.0_91  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/10292/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Max memory used | 76MB |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10292/console |


This message was automatically generated.



> Fix test-container-executor.c to pass
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Re

[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client

2016-01-14 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099185#comment-15099185
 ] 

Karthik Kambatla commented on YARN-4559:


There "could" be other stores that provide the same guarantees as zk-store. 

> Make leader elector and zk store share the same curator client
> --
>
> Key: YARN-4559
> URL: https://issues.apache.org/jira/browse/YARN-4559
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4559.1.patch, YARN-4559.2.patch
>
>
> After YARN-4438, we can reuse the same curator client for leader elector and 
> zk store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client

2016-01-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099163#comment-15099163
 ] 

Jian He commented on YARN-4559:
---

bq.  ZK-leader election with a different store.
I think currently only zk-store supports fencing, which means user must use 
zk-store for failover ?

> Make leader elector and zk store share the same curator client
> --
>
> Key: YARN-4559
> URL: https://issues.apache.org/jira/browse/YARN-4559
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4559.1.patch, YARN-4559.2.patch
>
>
> After YARN-4438, we can reuse the same curator client for leader elector and 
> zk store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client

2016-01-14 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099154#comment-15099154
 ] 

Karthik Kambatla commented on YARN-4559:


bq. I think the best way is just to merge the LeaderElectorService logic into 
ZKRMStateStore itself. 
I am not sure that is a good idea. Users might want to use ZK-leader election 
with a different store. 

> Make leader elector and zk store share the same curator client
> --
>
> Key: YARN-4559
> URL: https://issues.apache.org/jira/browse/YARN-4559
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4559.1.patch, YARN-4559.2.patch
>
>
> After YARN-4438, we can reuse the same curator client for leader elector and 
> zk store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2016-01-14 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4428:
---
Attachment: YARN-4428.3.patch

> Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
> -
>
> Key: YARN-4428
> URL: https://issues.apache.org/jira/browse/YARN-4428
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch, 
> YARN-4428.2.2.patch, YARN-4428.2.patch, YARN-4428.3.patch, YARN-4428.3.patch
>
>
> When AHS is turned on, if we can't view application in RM page, RM page 
> should redirect us to AHS page. For example, when you go to 
> cluster/app/application_1, if RM no longer remember the application, we will 
> simply get "Failed to read the application application_1", but it will be 
> good for RM ui to smartly try to redirect to AHS ui 
> /applicationhistory/app/application_1 to see if it's there. The redirect 
> usage already exist for logs in nodemanager UI.
> Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on 
> fall back of RM not remembering the app. YARN-3975 tried to do this only when 
> original tracking url is not set. But there are many cases, such as when app 
> failed at launch, original tracking url will be set to point to RM page, so 
> redirect to AHS page won't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4589) Diagnostics for localization timeouts is lacking

2016-01-14 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4589:
---
Attachment: YARN-4589.2.patch

> Diagnostics for localization timeouts is lacking
> 
>
> Key: YARN-4589
> URL: https://issues.apache.org/jira/browse/YARN-4589
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4589.2.patch, YARN-4589.patch
>
>
> When a container takes too long to localize it manifests as a timeout, and 
> there's no indication that localization was the issue. We need diagnostics 
> for timeouts to indicate the container was still localizing when the timeout 
> occurred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099091#comment-15099091
 ] 

Hadoop QA commented on YARN-4593:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
42s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 42s 
{color} | {color:red} root in trunk failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 37s 
{color} | {color:red} root in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 37s {color} 
| {color:red} root in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 27s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 34s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 44s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782366/YARN-4593-001.patch |
| JIRA Issue | YARN-4593 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2c4964c87234 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/person

[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-01-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099088#comment-15099088
 ] 

Eric Payne commented on YARN-4108:
--

[~leftnoteasy], great job! This approach looks like it has potential to vastly 
improve preemption. I just have a few comments and questions.

- In the lazy preemption case, PCPP will send an event to the scheduler to mark 
a container killable. Can PCPP check if it's already been marked before 
sending, so that maybe event traffic will be less in the RM?
- Currently, if both queueA and queueB are over their guaranteed capacity, 
preemption will still occur if queueA is more over capacity than queueB. I 
think it is probably important to preserve this behavior (YARN-2592).
-- I don't see anyplace where {{ResourceLimits#isAllowPreemption}} is called. 
But, if it is, Will the following code in {{LeafQueue}} change preemption 
behavior?
{noformat}
  private void setPreemptionAllowed(ResourceLimits limits, String 
nodePartition) {
// Set preemption-allowed:
// For leaf queue, only under-utilized queue is allowed to preempt 
resources from other queues
float usedCapacity = queueCapacities.getAbsoluteUsedCapacity(nodePartition);
float guaranteedCapacity = 
queueCapacities.getAbsoluteCapacity(nodePartition);
limits.setIsAllowPreemption(usedCapacity < guaranteedCapacity);
  }
{noformat}
-- Also, in {{ParentQueue#canAssign}}, does the following code have the same 
effect?
{noformat}
  if (this.getQueueCapacities().getUsedCapacity(node.getPartition())
  < 1.0f) {
{noformat}
- In {{AbstractCSQueue#canAssignToThisQueue}}:
-- I'm just trying to understand how things will be affected when headroom for 
a parent queue is (limit - used) + killable. Doesn't that say that a parent 
queue has more headroom than it's already acutally using? Is it relying on this 
behavior so that the {{assignment}} code will determine that it has more 
headroom when there are killable containers, and then rely on the leafqueue to 
kill those containers?
-- NPE if {{getChildQueues()}} returns null
{noformat}
if (null != getChildQueues() || !getChildQueues().isEmpty()) {
{noformat}
- {{CSAssignment#toKillContainers}}: I would call them {{containersToKill}}
{quote}
4. I would like to have some freedom in selecting conatiners (marking) for 
preemption. A simple sorting based on submission time or priority seems limited 
approach. Could we have some interface here so that we can plugin user specific 
comparision cases.

submission time
priority
demand based etc may be
{quote}
- To [~sunilg]'s point: Currently PCPP doesn't take into consideration things 
like locality or container size. If a queue is over its capacity by 8GB, and 
there are 1 8GB container plus 8 1GB containers, PCPP may decide to kill the 1 
8GB contaienr or it may decide to kill the 8 1GB containers, depending on 
properties like 'time since submission' and 'ignore-partition-exclusivity'. So, 
with the current, lazy preemption proposal, if the underserved queue needs an 
8GB container and the 8 1GB containers are marked as killable, at least now 
those containers don't get killed. It's a step in the right direction, but the 
underserved queue still has to wait. Same kind of thing with locality and other 
properties. It would be interesting to know what your thoughts are on making 
further modifications to PCPP to make more informed choices about which 
containers to kill. There may not be a "right" choice in PCPP, though, since 
the requirements of the underserved queue may change by the time the scheduler 
gets around to allocating resources.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, 
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v

[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client

2016-01-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099068#comment-15099068
 ] 

Jian He commented on YARN-4559:
---

new patch moved the curator#close in RM#serviceStop.

I cannot move the curator client creation into to LeaderElectorService because 
LeaderElectorService will not be created if HA is not enabled.
I think the best way is just to merge the LeaderElectorService logic into 
ZKRMStateStore itself. But that requires more refactoring on ZKRMStateStore 
because ZKRMStateStore is not an alwaysOn service. 

> Make leader elector and zk store share the same curator client
> --
>
> Key: YARN-4559
> URL: https://issues.apache.org/jira/browse/YARN-4559
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4559.1.patch, YARN-4559.2.patch
>
>
> After YARN-4438, we can reuse the same curator client for leader elector and 
> zk store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4559) Make leader elector and zk store share the same curator client

2016-01-14 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4559:
--
Attachment: YARN-4559.2.patch

> Make leader elector and zk store share the same curator client
> --
>
> Key: YARN-4559
> URL: https://issues.apache.org/jira/browse/YARN-4559
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4559.1.patch, YARN-4559.2.patch
>
>
> After YARN-4438, we can reuse the same curator client for leader elector and 
> zk store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4595) Add support for configurable read-only mounts

2016-01-14 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-4595:


 Summary: Add support for configurable read-only mounts
 Key: YARN-4595
 URL: https://issues.apache.org/jira/browse/YARN-4595
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


Mounting files or directories from the host is one way of passing configuration 
and other information into a docker container.  We could allow the user to set 
a list of mounts in the environment of ContainerLaunchContext (e.g. 
/dir1:/targetdir1,/dir2:/targetdir2).  These would be mounted read-only to the 
specified target locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4592) Remove unsed GetContainerStatus proto

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099017#comment-15099017
 ] 

Hadoop QA commented on YARN-4592:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 55s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782358/YARN-4592.patch |
| JIRA Issue | YARN-4592 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux bd68db2f13f6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 817cc1f |
| Default Java | 1.7.0_91 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_66 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 |
| JDK v1.7.0_91  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/10289/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api |
| Max memory used | 76MB |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10289/console |


This message was automatically generated.



> Remove unsed GetContainerStatus proto
> -
>
> Key: YARN-4592
> URL: https://issues.apache.org/jira/browse/YARN-4592
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang 

[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts

2016-01-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098996#comment-15098996
 ] 

Jian He commented on YARN-4584:
---

I have one question, normally, if FailuresValidityInterval <=0, no attempts 
will be removed from store, because the number of attempts should always be <= 
max-attempts.
But if attempts were failed for the reasons like preempted, disk failed ( see 
RMAppImpl#shouldCountTowardsMaxAttemptRetry), even if FailuresValidityInterval 
is <= 0, the number of attempts could go beyond max-attempt.
Is this the scenario you are running into ?


> RM startup failure when AM attempts greater than max-attempts
> -
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4584.patch
>
>
> Configure 3 queue in cluster with 8 GB
> # queue 40%
> # queue 50% 
> # default 10%
> * Submit applications to all 3 queue with container size as 1024MB (sleep job 
> with 50 containers on all queues)
> * AM that gets assigned to default queue and gets preempted immediately after 
> 20 preemption kill all application
> Due resource limit in default queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServ

[jira] [Updated] (YARN-4594) Fix test-container-executor.c to pass

2016-01-14 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated YARN-4594:
---
Attachment: YARN-4594.001.patch

> Fix test-container-executor.c to pass
> -
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: YARN-4594.001.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-01-14 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098974#comment-15098974
 ] 

MENG DING commented on YARN-4108:
-

Hi, [~leftnoteasy], will there be a separate ticket to track the issue of 
selecting to-be-preempted containers based on pending new/increase resource 
request?

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, 
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4594) Fix test-container-executor.c to pass

2016-01-14 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created YARN-4594:
--

 Summary: Fix test-container-executor.c to pass
 Key: YARN-4594
 URL: https://issues.apache.org/jira/browse/YARN-4594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


test-container-executor.c doesn't work:
* It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
/usr/bin/ls on many systems.
* The recursive delete logic in container-executor.c fails -- nftw does the 
wrong thing when confronted with directories with the wrong mode (permission 
bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098972#comment-15098972
 ] 

Hadoop QA commented on YARN-4371:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 9s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client (total was 15, now 16). 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
1s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 16s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 26s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 143m 8s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_66 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_91 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_91 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.Test

[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-14 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098966#comment-15098966
 ] 

Li Lu commented on YARN-4265:
-

The maven dependency check uses a separate maven repository 
{{-Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-trunk-0}}. I tested the 
whole workflow for building from the scratch and it worked. The AHS-test jar 
has not been published to apache SNAPSHOT (timeline plugin is the first module 
to depend on this module) yet. Therefore the mvn dependency test appears to be 
unrelated here. 

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, 
> YARN-4265-trunk.005.patch, YARN-4265-trunk.006.patch, 
> YARN-4265-trunk.007.patch, YARN-4265-trunk.008.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098951#comment-15098951
 ] 

Hadoop QA commented on YARN-4265:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 38s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 11s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
59s {color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped branch modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 2s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} mvndep {color} | {color:red} 0m 27s 
{color} | {color:red} patch's 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server dependency:list failed 
{color} |
| {color:red}-1{color} | {color:red} mvndep {color} | {color:red} 0m 44s 
{color} | {color:red} patch's 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 dependency:list failed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 44s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 34s 
{color} | {color:red} Patch generated 8 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 292, now 299). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
7s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patch modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
34s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 39s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 7 new issues (was 544, now 551). {color} |

[jira] [Updated] (YARN-4593) Deadlock in AbstractService.getConfig()

2016-01-14 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4593:
-
Attachment: YARN-4593-001.patch

Patch 001, removes the -synchronized attributed. The field is still volatile 
(and only written in {{Service.init()}}

> Deadlock in AbstractService.getConfig()
> ---
>
> Key: YARN-4593
> URL: https://issues.apache.org/jira/browse/YARN-4593
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.2
> Environment: AM restarting on kerberized cluster
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-4593-001.patch
>
>
> SLIDER-1052 has found a deadlock which can arise in it during AM restart. 
> Looking at the thread trace, one of the blockages is actually 
> {{AbstractService.getConfig()}} —this is synchronized and so blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()

2016-01-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098826#comment-15098826
 ] 

Steve Loughran commented on YARN-4593:
--

SLIDER-1052 shows this. I'm not going to blame this JIRA for that, but it's 
certainly where the problem arises

> Deadlock in AbstractService.getConfig()
> ---
>
> Key: YARN-4593
> URL: https://issues.apache.org/jira/browse/YARN-4593
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.2
> Environment: AM restarting on kerberized cluster
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> SLIDER-1052 has found a deadlock which can arise in it during AM restart. 
> Looking at the thread trace, one of the blockages is actually 
> {{AbstractService.getConfig()}} —this is synchronized and so blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()

2016-01-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098825#comment-15098825
 ] 

Steve Loughran commented on YARN-4593:
--

I don't see why we need this to be synchronized (it's that way in branch-205); 
the actual type is {{volatile}} so access is thread safe anyway.


> Deadlock in AbstractService.getConfig()
> ---
>
> Key: YARN-4593
> URL: https://issues.apache.org/jira/browse/YARN-4593
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.2
> Environment: AM restarting on kerberized cluster
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> SLIDER-1052 has found a deadlock which can arise in it during AM restart. 
> Looking at the thread trace, one of the blockages is actually 
> {{AbstractService.getConfig()}} —this is synchronized and so blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4593) Deadlock in AbstractService.getConfig()

2016-01-14 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-4593:


 Summary: Deadlock in AbstractService.getConfig()
 Key: YARN-4593
 URL: https://issues.apache.org/jira/browse/YARN-4593
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.2
 Environment: AM restarting on kerberized cluster
Reporter: Steve Loughran
Assignee: Steve Loughran


SLIDER-1052 has found a deadlock which can arise in it during AM restart. 
Looking at the thread trace, one of the blockages is actually 
{{AbstractService.getConfig()}} —this is synchronized and so blocked.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098785#comment-15098785
 ] 

Hadoop QA commented on YARN-4389:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 58s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 161, now 166). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 14s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 12s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed

[jira] [Created] (YARN-4592) Remove unsed GetContainerStatus proto

2016-01-14 Thread Chang Li (JIRA)
Chang Li created YARN-4592:
--

 Summary: Remove unsed GetContainerStatus proto
 Key: YARN-4592
 URL: https://issues.apache.org/jira/browse/YARN-4592
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
Priority: Minor
 Attachments: YARN-4592.patch

GetContainerStatus protos have been left unused since YARN-926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4592) Remove unsed GetContainerStatus proto

2016-01-14 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4592:
---
Attachment: YARN-4592.patch

> Remove unsed GetContainerStatus proto
> -
>
> Key: YARN-4592
> URL: https://issues.apache.org/jira/browse/YARN-4592
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Minor
> Attachments: YARN-4592.patch
>
>
> GetContainerStatus protos have been left unused since YARN-926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4558) Yarn client retries on some non-retriable exceptions

2016-01-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098722#comment-15098722
 ] 

Sergey Shelukhin commented on YARN-4558:


In this case, retry policy is built in YARN code. I don't know if there's more 
to it on Hadoop side than what YARN sets up, but from a cursory examination it 
doesn't look like that is the case.

> Yarn client retries on some non-retriable exceptions
> 
>
> Key: YARN-4558
> URL: https://issues.apache.org/jira/browse/YARN-4558
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Sergey Shelukhin
>Priority: Minor
>
> Seems the problem is in RMProxy where the policy is built.
> {noformat}
> Thread 23594: (state = BLOCKED)
> - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
> - org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(java.lang.Object, 
> java.lang.reflect.Method, java.lang.Object[]) @bci=603, line=155 (Interpreted 
> frame)
> - 
> com.sun.proxy.$Proxy32.getClusterNodes(org.apache.hadoop.yarn.api.protocolrecords.GetClusterNodesRequest)
>  @bci=16 (Interpreted frame)
> - 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(org.apache.hadoop.yarn.api.records.NodeState[])
>  @bci=66, line=515 (Interpreted frame)
> {noformat}
> produces
> {noformat}
> 2016-01-07 02:50:45,111 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : javax.security.sasl.SaslException: GSS initiate 
> failed [Caused by GSSException: No valid credentials provided (Mechanism 
> level: Failed to find any Kerberos tgt)]
> 2016-01-07 02:51:15,126 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : javax.security.sasl.SaslException: GSS initiate 
> failed [Caused by GSSException: No valid credentials provided (Mechanism 
> level: Failed to find any Kerberos tgt)]
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2016-01-14 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098682#comment-15098682
 ] 

Kuhu Shukla commented on YARN-3102:
---

[~templedf], [~jlowe], Requesting for review comments. Thanks a lot!

> Decommisioned Nodes not listed in Web UI
> 
>
> Key: YARN-3102
> URL: https://issues.apache.org/jira/browse/YARN-3102
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
> Environment: 2 Node Manager and 1 Resource Manager 
>Reporter: Bibin A Chundatt
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch, 
> YARN-3102-v3.patch, YARN-3102-v4.patch, YARN-3102-v5.patch
>
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name 
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in 
> /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-01-14 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4311:
--
Attachment: YARN-4311-v8.patch

Thank you [~jlowe] for the review comments. I have updated the patch. The 
interval is 1/2 the value of the timeout config field( or should it be 1/3 like 
the NM expiry interval, although that could be excessive in my opinion). The 
default timeout is 1 minute. The interval is capped at 10 minutes.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, 
> YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2016-01-14 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098641#comment-15098641
 ] 

Giovanni Matteo Fumarola commented on YARN-2885:


Agree with [~kishorch]. If it will remain in this way, we will have to add more 
checks in FederationInterceptor. 

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch, 
> YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, 
> YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, 
> YARN-2885-yarn-2877.v4.patch, YARN-2885-yarn-2877.v5.patch, 
> YARN-2885_api_changes.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist

2016-01-14 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098617#comment-15098617
 ] 

zhihai xu commented on YARN-3446:
-

[~kasha], thanks for the review and committing the patch!

> FairScheduler headroom calculation should exclude nodes in the blacklist
> 
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.9.0
>
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, 
> YARN-3446.005.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4371:
--
Attachment: 0006-YARN-4371.patch

Fixing javadoc issue.

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, 
> 0003-YARN-4371.patch, 0004-YARN-4371.patch, 0005-YARN-4371.patch, 
> 0006-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098583#comment-15098583
 ] 

Hadoop QA commented on YARN-4389:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 48s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 161, now 166). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 58s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed

[jira] [Commented] (YARN-4551) Address the duplication between StatusUpdateWhenHealthy and StatusUpdateWhenUnhealthy transitions

2016-01-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098574#comment-15098574
 ] 

Sunil G commented on YARN-4551:
---

Thank you [~kasha] for the review and commit!

> Address the duplication between StatusUpdateWhenHealthy and 
> StatusUpdateWhenUnhealthy transitions
> -
>
> Key: YARN-4551
> URL: https://issues.apache.org/jira/browse/YARN-4551
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Sunil G
>Priority: Minor
>  Labels: newbie
> Fix For: 2.9.0
>
> Attachments: 0001-YARN-4551.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-01-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098556#comment-15098556
 ] 

Steve Loughran commented on YARN-4577:
--

Test wise

* {{testLoadAuxServiceLocally}} should be calling aux.close() in finally{} 
clauses. It's idempotent so you could so a close() in the main path (and so 
test it), but still clean up after.
* it'd be nice for the asserts to include some text about why the asserts are 
failing, especially simple {{assertTrue()}} calls. The goal is that enough 
information is printed to enable someone who sees the Jenkins log to be able to 
diagnose the problem. An "assert failed line 315" doesn't do that much, leads 
to the "add more test diagnostics" patches and more iterations.

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2016-01-14 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4428:
---
Attachment: YARN-4428.3.patch

thanks [~jlowe] for review and point me to the related issue! updated .3 patch 
which compute redirect inside RMWebAppFilter

> Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
> -
>
> Key: YARN-4428
> URL: https://issues.apache.org/jira/browse/YARN-4428
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch, 
> YARN-4428.2.2.patch, YARN-4428.2.patch, YARN-4428.3.patch
>
>
> When AHS is turned on, if we can't view application in RM page, RM page 
> should redirect us to AHS page. For example, when you go to 
> cluster/app/application_1, if RM no longer remember the application, we will 
> simply get "Failed to read the application application_1", but it will be 
> good for RM ui to smartly try to redirect to AHS ui 
> /applicationhistory/app/application_1 to see if it's there. The redirect 
> usage already exist for logs in nodemanager UI.
> Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on 
> fall back of RM not remembering the app. YARN-3975 tried to do this only when 
> original tracking url is not set. But there are many cases, such as when app 
> failed at launch, original tracking url will be set to point to RM page, so 
> redirect to AHS page won't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-01-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098526#comment-15098526
 ] 

Sunil G commented on YARN-4108:
---

Thank you [~leftnoteasy] for the detailed updated design doc. Few comments or 
queries

1. In current PCPP, we first preempt all reserved continers from all 
applications in a queue if its overallocated. After introduding "killable 
containers", so we will have some containers which are reserved or running. Am 
I correct? or reserved containers are planning to handled differently?

Possible cases: 
- queueA's appA has made a reservation on node1. now queueB has demand, 
and we can unreserve appA's container requests from node1 and queueB's app can 
allocate some containers there.
- or queueB's app can now reserve a container there. This case I think 
is explained by the design doc.
Basically we can still try to preempt reserved container first.

2. If I understood correctly, "killable containers" will be triggered with 
preeempt event only if a proper allocation can happen for target application 
(from underserving queue). 
- So do we send preempt_container event to AM here at  this point? This 
can make a delay of 15secs, so if by some chance scheduling footprint is 
changed (some other NMs freed space), we may some overkill. (I guess this can 
happen now also).
- 15secs later, RM do forcekill to these containers. Is there any 
change for this approach in this new design?

because as per doc, its mentioned that we need to preempt only if we can 
reserve. So I am slightly confused here.

3. To cancel "killable container", i think PCPP will take the call by waiting 
for some interval. So some new configuration is needed for this?

4. I would like to have some freedom in selecting conatiners (marking) for 
preemption. A simple sorting based on submission time or priority seems limited 
approach. Could we have some interface here so that we can plugin user specific 
comparision cases.
- submission time
- priority
- demand based etc may be

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, 
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2016-01-14 Thread Kishore Chaliparambil (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098525#comment-15098525
 ] 

Kishore Chaliparambil commented on YARN-2885:
-

Hi [~asuresh],

I am reviewing the latest patch. I noticed that we assume that the last 
interceptor in the chain will be the LocalScheduler. This might break the model 
when we support YARN federation (YARN 3666). Federation interceptor will have 
to be the last interceptor since it abstracts the fact that there are multiple 
clusters from the application and clients. So I think instead of talking to the 
RM directly from the LocalScheduler, we can forward the request to the next 
interceptor in the chain. And until federation is implemented, we can have 
another interceptor implementation (e.g. DefaultRequestInterceptor ) that talks 
to the RM and use that as the last interceptor in the chain.

Thanks,
Kishore


> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch, 
> YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, 
> YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, 
> YARN-2885-yarn-2877.v4.patch, YARN-2885-yarn-2877.v5.patch, 
> YARN-2885_api_changes.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-14 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098511#comment-15098511
 ] 

Karthik Kambatla commented on YARN-1011:


[~nroberts], [~leftnoteasy] - reasonable concerns. I am looking into allowing 
the app ask only for guaranteed containers. Scheduling will likely remain 
simple: in our loop, we just skip an application if it is not interested in 
opportunistic containers. Promotion, though, becomes tricky: we should hold off 
on promoting a container until all higher-"priority" applications that want 
only guaranteed containers get them.

Welcome any thoughts/suggestions on handling promotion if we allow applications 
to ask for only guaranteed containers. I ll continue brain-storming. We want to 
have a simple mechanism, if possible; complex protocols seem to find a way to 
hoard bugs. 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf, 
> yarn-1011-design-v2.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4265:

Attachment: YARN-4265-trunk.008.patch

New patch to address the checkstyle hidden field warning. I also tested locally 
for the maven dependency issue. I cleaned my local m2 repository, rebuilt 
hadoop, and ran maven dependency. I cannot reproduce the problem. Also, the 
problem appears to be intermittent in the past Jenkins runs. 

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, 
> YARN-4265-trunk.005.patch, YARN-4265-trunk.006.patch, 
> YARN-4265-trunk.007.patch, YARN-4265-trunk.008.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2016-01-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098502#comment-15098502
 ] 

Naganarasimha G R commented on YARN-3215:
-

Thanks for detailing out [~sunilg] and small correction to the solution what 
you have mentioned as per our discussion, it should be
min (total *unused* resourcelimit for a given label,  ), so that headroom doesn't exceed whats actually available! right 
?
cc /[~wangda] thoughts?

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3215.v1.001.patch
>
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098477#comment-15098477
 ] 

Hadoop QA commented on YARN-3102:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 50s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 12s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 148m 46s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782289/YARN-3102-v5.patch |
| JIRA Issue | YARN-3102 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  c

[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098447#comment-15098447
 ] 

Sunil G commented on YARN-4371:
---

javadoc warning is valid. I will update a new patch.

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, 
> 0003-YARN-4371.patch, 0004-YARN-4371.patch, 0005-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-01-14 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098461#comment-15098461
 ] 

Xuan Gong commented on YARN-4577:
-

Thanks, [~sjlee0] for the comments and suggestions.

+1 for the suggestion to have a single generic solution that can address all 
the needs for isolated classloading. But i think that we still need some 
improvement on this. 

The use case here is simple: if we specify the aux-services classpath, either 
from local fs or from hdfs, we will load this service from the specified 
classpath (no matter we set the classpath in NM path or not). Otherwise, we 
load the service from the NM path. 

For ApplicationClassLoader, 
{code}
public ApplicationClassLoader(String classpath, ClassLoader parent,
  List systemClasses)
{code}
looks like we have to specify classpath (we can not set it null). Also, it 
needs me to specify systemClasses which is not required in this use-case. There 
are some un-necessary checks, such as isSystemClass() when we call loadClass. 
Overall, i think that the ApplicationClassLoader is too complicate for this 
use-case.

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2016-01-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4389:
--
Attachment: 0009-YARN-4389.patch

Updating patch with validation for disableThreshold. Also added few test cases 
to verify all conditions.
[~djp]/[~rohithsharma] pls help to check the same.

> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be app specific 
> rather than a setting for whole YARN cluster
> ---
>
> Key: YARN-4389
> URL: https://issues.apache.org/jira/browse/YARN-4389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, 
> 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, 
> 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch, 
> 0009-YARN-4389.patch
>
>
> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be application 
> specific rather than a setting in cluster level, or we should't maintain 
> amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We 
> should allow each am to override this config, i.e. via submissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098422#comment-15098422
 ] 

Hadoop QA commented on YARN-4371:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 9s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client (total was 15, now 16). 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 36s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_91 with JDK 
v1.7.0_91 generated 1 new issues (was 0, now 1). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 16s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 29s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_66 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.ap

[jira] [Created] (YARN-4591) YARN Web UIs should provide a robots.txt

2016-01-14 Thread Lars Francke (JIRA)
Lars Francke created YARN-4591:
--

 Summary: YARN Web UIs should provide a robots.txt
 Key: YARN-4591
 URL: https://issues.apache.org/jira/browse/YARN-4591
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Lars Francke
Priority: Trivial


To prevent well-behaved crawlers from indexing public YARN UIs.

Similar to HDFS-330 / HDFS-9651.

I took a quick look at the Webapp stuff in YARN and it looks complicated so I 
can't provide a quick patch. If anyone can point me in the right direction I 
might take a look.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist

2016-01-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098360#comment-15098360
 ] 

Hudson commented on YARN-3446:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9112 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9112/])
YARN-3446. FairScheduler headroom calculation should exclude nodes in (kasha: 
rev 9d04f26d4c42170ee3dab2f6fb09a94bbf72fc65)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAppSchedulingInfo.java


> FairScheduler headroom calculation should exclude nodes in the blacklist
> 
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, 
> YARN-3446.005.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist

2016-01-14 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3446:
---
Summary: FairScheduler headroom calculation should exclude nodes in the 
blacklist  (was: FairScheduler headroom calculation should exclude nodes in the 
blacklist.)

> FairScheduler headroom calculation should exclude nodes in the blacklist
> 
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, 
> YARN-3446.005.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist.

2016-01-14 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3446:
---
Summary: FairScheduler headroom calculation should exclude nodes in the 
blacklist.  (was: FairScheduler HeadRoom calculation should exclude nodes in 
the blacklist.)

> FairScheduler headroom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, 
> YARN-3446.005.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2016-01-14 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098347#comment-15098347
 ] 

zhihai xu commented on YARN-3446:
-

The test failures for TestClientRMTokens and TestAMAuthorizatio are not related 
to the patch. Both tests are passed in my local build.

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch, 
> YARN-3446.005.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-01-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098341#comment-15098341
 ] 

Eric Payne commented on YARN-4108:
--

bq. Do you have any comments/suggestions about the latest proposal? Does it 
make sense to you?
I'm still contemplating it. I will let you know. Thanks.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, 
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2016-01-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098292#comment-15098292
 ] 

Sunil G commented on YARN-4389:
---

Yes [~djp], that's perfectly fine. We could do that, and I will make necessary 
changes.

> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be app specific 
> rather than a setting for whole YARN cluster
> ---
>
> Key: YARN-4389
> URL: https://issues.apache.org/jira/browse/YARN-4389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, 
> 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, 
> 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch
>
>
> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be application 
> specific rather than a setting in cluster level, or we should't maintain 
> amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We 
> should allow each am to override this config, i.e. via submissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2016-01-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098285#comment-15098285
 ] 

Junping Du commented on YARN-4389:
--

Thanks [~sunilg] for updating the patch. The patch looks pretty good now except 
one NIT:
Shall we check disable threshold to be a valid value (not a negative value, 
less or equally than 1.0 f) set by app? If app's value is not valid, may be we 
should log a warn message and use global setting instead.

> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be app specific 
> rather than a setting for whole YARN cluster
> ---
>
> Key: YARN-4389
> URL: https://issues.apache.org/jira/browse/YARN-4389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, 
> 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, 
> 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch
>
>
> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be application 
> specific rather than a setting in cluster level, or we should't maintain 
> amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We 
> should allow each am to override this config, i.e. via submissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2016-01-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4389:
--
Attachment: 0008-YARN-4389.patch

Thanks [~rohithsharma]

This suggestion makes sense to me. With this, we will be giving option to 
application to enable/disable this feature regardless of the configuration in 
YARN. Uploading a new patch, also gave all cases as comment in code.

> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be app specific 
> rather than a setting for whole YARN cluster
> ---
>
> Key: YARN-4389
> URL: https://issues.apache.org/jira/browse/YARN-4389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, 
> 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, 
> 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch
>
>
> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be application 
> specific rather than a setting in cluster level, or we should't maintain 
> amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We 
> should allow each am to override this config, i.e. via submissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2016-01-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098246#comment-15098246
 ] 

Sunil G commented on YARN-4389:
---

[~djp] and [~rohithsharma], could you  please help to check the patch.

> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be app specific 
> rather than a setting for whole YARN cluster
> ---
>
> Key: YARN-4389
> URL: https://issues.apache.org/jira/browse/YARN-4389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, 
> 0003-YARN-4389.patch, 0004-YARN-4389.patch, 0005-YARN-4389.patch, 
> 0006-YARN-4389.patch, 0007-YARN-4389.patch, 0008-YARN-4389.patch
>
>
> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be application 
> specific rather than a setting in cluster level, or we should't maintain 
> amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We 
> should allow each am to override this config, i.e. via submissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098231#comment-15098231
 ] 

Junping Du commented on YARN-4265:
--

The checkstyle complain:
{noformat}
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/LogInfo.java:90:
String filename = getFilename();:12: 'filename' hides a field"
{noformat}
should be taken care of. Also, it sounds like maven dependency failed?

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265-trunk.003.patch, YARN-4265-trunk.004.patch, 
> YARN-4265-trunk.005.patch, YARN-4265-trunk.006.patch, 
> YARN-4265-trunk.007.patch, YARN-4265.YARN-4234.001.patch, 
> YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3102) Decommisioned Nodes not listed in Web UI

2016-01-14 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-3102:
--
Attachment: YARN-3102-v5.patch

Addressed findbugs warning.

> Decommisioned Nodes not listed in Web UI
> 
>
> Key: YARN-3102
> URL: https://issues.apache.org/jira/browse/YARN-3102
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
> Environment: 2 Node Manager and 1 Resource Manager 
>Reporter: Bibin A Chundatt
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch, 
> YARN-3102-v3.patch, YARN-3102-v4.patch, YARN-3102-v5.patch
>
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name 
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in 
> /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2016-01-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098219#comment-15098219
 ] 

Sunil G commented on YARN-3215:
---

Hi [~Naganarasimha Garla]
As discussed offline, I will try explaining the problem more clearly.

{{getQueueMaxResource}} considers 
{{queueCapacities.getAbsoluteMaximumCapacity}} to calculate max possible 
resource limit  per-label. As per the approach, we try to find the difference 
with this limit to actual resource used for that label (in a queue) to get the 
headroom. As we consider absolute max capacity, there are chances that this 
limit may reach more than 100% . (on possibility is that same label is used in 
another queue).
for eg:
label X is configured in queueA.
 - capacity is 30%
 - max capacity is 60%

label X is configured in queueB.
 - capacity is 70%
 - max capcity is 80%

IN this case, its possible that we may get more than 100% for label X resource 
limit. So headroom calculated per queue may be more than what is already 
available. I think this is not very major problem, but its better to solved by 
putting a max cap with total resource limit per label vs its used limit. 
Thoughts?



> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3215.v1.001.patch
>
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

2016-01-14 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098214#comment-15098214
 ] 

Varun Vasudev commented on YARN-3542:
-

{quote}
This effectively means the old code is not used anymore, and that the new 
code is stable.

And it seems like we are stating that

[..], there should be no issue hooking into the new handler using the old 
configuration mechanism.?

Given this and the fact that we are internally overriding to use the new 
handlers, is there a reason for keeping the old code at all? Also if we are 
using the new handler code internally anyways, we can proceed with the 
deprecation (or better deletion) of LCEResourcesHandler interface, 
DefaultLCEResourcesHandler etc?
{quote}

I think there are two issues here - 
1) should we keep the CgroupsLCEResourcesHandler code and
2) should we do away with the resource-handler code altogether

The answer to (1) is that it is not required but I think it's useful to keep 
around for some time for two reasons - 
1. As documentation/reference - in case some discrepancy is seen in the new 
implementation
2. In the off-chance that someone has inherited from it. Unfortunately the 
resource-handler classes are public via config but are not annotated as such.

The answer to (2) is no - the resource handler interface is public and we 
shouldn't break it. Deprecating it is fine once we provide an alternative(which 
we currently don't have) or decide we will not support it any longer. Either 
way that decision should be a seperate JIRA, not this one.

> Re-factor support for CPU as a resource using the new ResourceHandler 
> mechanism
> ---
>
> Key: YARN-3542
> URL: https://issues.apache.org/jira/browse/YARN-3542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-3542.001.patch, YARN-3542.002.patch, 
> YARN-3542.003.patch, YARN-3542.004.patch, YARN-3542.005.patch, 
> YARN-3542.006.patch, YARN-3542.007.patch
>
>
> In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier 
> addition of new resource types in the nodemanager (this was used for network 
> as a resource - See YARN-2140 ). We should refactor the existing CPU 
> implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using 
> the new ResourceHandler mechanism. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4371:
--
Attachment: 0005-YARN-4371.patch

Thank you [~jlowe] for the comments. I have updated the patch as per same.

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, 
> 0003-YARN-4371.patch, 0004-YARN-4371.patch, 0005-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4590) SLS(Scheduler Load Simulator) web pages can't load css and js resource

2016-01-14 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098168#comment-15098168
 ] 

Bibin A Chundatt commented on YARN-4590:


[~xupeng]

Can you try from {{hadoop/tools/sls}} as below 
bin/slsrun.sh --input-rumen=./sample-data/2jobs2min-rumen-jh.json 
--output-dir=./sample-data/

For trunk i think we can update the add classpath as below

{noformat}
function calculate_classpath
{
  hadoop_add_to_classpath_toolspath
  SLS_HTML_DIR="${HADOOP_PREFIX}"
  if [[ -n "${HADOOP_PREFIX}" ]]; then
SLS_HTML_DIR="${HADOOP_PREFIX}/share/hadoop/tools/sls/html"
  else
this="${BASH_SOURCE-$0}"
bin=$(cd -P -- "$(dirname -- "${this}")" >/dev/null && pwd -P)
SLS_HTML_DIR="${bin}/../html"
   fi

  hadoop_debug "Injecting ${SLS_HTML_DIR} into classpath"
  hadoop_add_classpath "${SLS_HTML_DIR}"
  if [[ ! -d html ]]; then
 ln -s "${SLS_HTML_DIR}" html
  fi
}
{noformat}

Anythoughts on this?

> SLS(Scheduler Load Simulator) web pages can't load css and js resource 
> ---
>
> Key: YARN-4590
> URL: https://issues.apache.org/jira/browse/YARN-4590
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: xupeng
>Priority: Minor
>
> HadoopVersion : 2.6.0 / with patch YARN-4367-branch-2
> 1. run command "./slsrun.sh 
> --input-rumen=../sample-data/2jobs2min-rumen-jh.json 
> --output-dir=../sample-data/"
> success
> 2. open web page "http://10.6.128.88:10001/track"; 
> can not load css and js resource 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-01-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098085#comment-15098085
 ] 

Sunil G commented on YARN-2009:
---

Thanks [~leftnoteasy] for sharing the updates. I will also take  a look on 
YARN-4108.

bq.I think how to choose to-be preempted containers should be handled inside 
PCPP by different policies
At first step, this will be really helpful and it will make code more easier 
and understandable. For in queue preemption, I will work on a similar approach 
what is taken in PCPP and will post a doc here. I think it can be another 
policy which is independent of PCPP, but could share few common code as 
possible.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts

2016-01-14 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098076#comment-15098076
 ] 

Bibin A Chundatt commented on YARN-4584:


[~rohithsharma]
Testcase failures are not related to patch attached

> RM startup failure when AM attempts greater than max-attempts
> -
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4584.patch
>
>
> Configure 3 queue in cluster with 8 GB
> # queue 40%
> # queue 50% 
> # default 10%
> * Submit applications to all 3 queue with container size as 1024MB (sleep job 
> with 50 containers on all queues)
> * AM that gets assigned to default queue and gets preempted immediately after 
> 20 preemption kill all application
> Due resource limit in default queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4462) FairScheduler: Disallow preemption from a queue

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098072#comment-15098072
 ] 

Hadoop QA commented on YARN-4462:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 76, now 76). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
1s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 14s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 36s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 148m 23s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782254/YARN-4462.004.

[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098067#comment-15098067
 ] 

Hadoop QA commented on YARN-4584:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 241, now 241). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 25s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 35s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 139m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782252/0001-YARN-45

[jira] [Updated] (YARN-3568) TestAMRMTokens should use some random port

2016-01-14 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3568:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> TestAMRMTokens should use some random port
> --
>
> Key: YARN-3568
> URL: https://issues.apache.org/jira/browse/YARN-3568
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Gera Shegalov
>
> Since the default port is used for yarn.resourcemanager.scheduler.address, if 
> we already run a pseudo-distributed cluster on the same development machine, 
> the test fails like this:
> {code}
> testMasterKeyRollOver[0](org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens)
>   Time elapsed: 1.511 sec  <<< ERROR!
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:8030] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:444)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:413)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:590)
>   at org.apache.hadoop.ipc.Server.(Server.java:2340)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:945)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:534)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:140)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:586)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:996)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1037)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1033)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1033)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1073)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens.testMasterKeyRollOver(TestAMRMTokens.java:235)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098042#comment-15098042
 ] 

Hadoop QA commented on YARN-4538:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 34s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91
 with JDK v1.7.0_91 generated 1 new issues (was 2, now 3). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 17s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 26s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 147m 22s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   

[jira] [Commented] (YARN-4565) When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only

2016-01-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098034#comment-15098034
 ] 

Naganarasimha G R commented on YARN-4565:
-

+1,  Patch LGTM 

> When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, 
> Sometimes lead to situation where all queue resources consumed by AMs only
> 
>
> Key: YARN-4565
> URL: https://issues.apache.org/jira/browse/YARN-4565
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
> Attachments: YARN-4565.1.patch, YARN-4565.2.patch, YARN-4565.3.patch
>
>
> When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, 
> Sometimes lead to situation where all queue resources consumed by AMs only,
> So from users perpective it appears that all application in queue are stuck, 
> whole queue capacity is comsumed by AMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4317) Test failure: TestResourceTrackerService

2016-01-14 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4317:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> Test failure: TestResourceTrackerService 
> -
>
> Key: YARN-4317
> URL: https://issues.apache.org/jira/browse/YARN-4317
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tsuyoshi Ozawa
>
> {quote}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.438 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
> testReconnectNode(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 0.114 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<15360> but was:<10240>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testReconnectNode(TestResourceTrackerService.java:624)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4453) TestMiniYarnClusterNodeUtilization occasionally times out in trunk

2016-01-14 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4453:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> TestMiniYarnClusterNodeUtilization occasionally times out in trunk
> --
>
> Key: YARN-4453
> URL: https://issues.apache.org/jira/browse/YARN-4453
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Sunil G
>
> TestMiniYarnClusterNodeUtilization failures are observed in few test runs in 
> YARN-4293. 
> In local also, same test case is timing out.
> {noformat}
> java.lang.Exception: test timed out after 6 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:158)
>   at com.sun.proxy.$Proxy85.nodeHeartbeat(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:113)
> {noformat}
> YARN-3980, where this test are added, reported few timed-out cases. I think 
> this is to be investigated because its not looks good to increase timeout for 
> tests, if tests fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4462) FairScheduler: Disallow preemption from a queue

2016-01-14 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated YARN-4462:
--
Attachment: YARN-4462.004.patch

> FairScheduler: Disallow preemption from a queue
> ---
>
> Key: YARN-4462
> URL: https://issues.apache.org/jira/browse/YARN-4462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
> Attachments: YARN-4462.001.patch, YARN-4462.002.patch, 
> YARN-4462.003.patch, YARN-4462.004.patch
>
>
> When scheduler preemption is enabled, applications could be preempted if they 
> obtain resource over they should take. 
> When a mapreduce application is preempted some resource, it just runs slower. 
> However, when the preempted application is a long-run service, such as tomcat 
> running in slider, the service would fail.
> So we should have a flag for application to indicate the scheduler that those 
> application should not be preempted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >