[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-28 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978364#comment-14978364
 ] 

Varun Saxena commented on YARN-2902:


[~jlowe],
For some reason, QA report was not published to JIRA. After integration with 
Yetus it seems.
Its successful though. Kindly find the reports here. 
https://builds.apache.org/job/PreCommit-YARN-Build/9593/console

{panel}
+1 overall

|| Vote ||  Subsystem ||  Runtime   || Comment||
|   0  |reexec  |  0m 8s | docker + precommit patch detected. 
|  +1  |   @author  |  0m 0s | The patch does not contain any @author 
|  ||| tags.
|  +1  |test4tests  |  0m 0s | The patch appears to include 3 new or 
|  ||| modified test files.
|  +1  |mvninstall  |  3m 22s| trunk passed 
|  +1  |   compile  |  0m 24s| trunk passed with JDK v1.8.0_60 
|  +1  |   compile  |  0m 25s| trunk passed with JDK v1.7.0_79 
|  +1  |checkstyle  |  0m 12s| trunk passed 
|  +1  |mvneclipse  |  0m 15s| trunk passed 
|  +1  |  findbugs  |  1m 3s | trunk passed 
|  +1  |   javadoc  |  0m 24s| trunk passed with JDK v1.8.0_60 
|  +1  |   javadoc  |  0m 27s| trunk passed with JDK v1.7.0_79 
|  +1  |mvninstall  |  0m 27s| the patch passed 
|  +1  |   compile  |  0m 26s| the patch passed with JDK v1.8.0_60 
|  +1  |cc  |  0m 26s| the patch passed 
|  +1  | javac  |  0m 26s| the patch passed 
|  +1  |   compile  |  0m 25s| the patch passed with JDK v1.7.0_79 
|  +1  |cc  |  0m 25s| the patch passed 
|  +1  | javac  |  0m 25s| the patch passed 
|  +1  |checkstyle  |  0m 13s| the patch passed 
|  +1  |mvneclipse  |  0m 14s| the patch passed 
|  +1  |whitespace  |  0m 0s | Patch has no whitespace issues. 
|  +1  |  findbugs  |  1m 12s| the patch passed 
|  +1  |   javadoc  |  0m 22s| the patch passed with JDK v1.8.0_60 
|  +1  |   javadoc  |  0m 26s| the patch passed with JDK v1.7.0_79 
|  +1  |  unit  |  9m 18s| hadoop-yarn-server-nodemanager in the 
|  ||| patch passed with JDK v1.8.0_60.
|  +1  |  unit  |  9m 29s| hadoop-yarn-server-nodemanager in the 
|  ||| patch passed with JDK v1.7.0_79.
|  +1  |asflicense  |  0m 25s| Patch does not generate ASF License 
|  ||| warnings.
|  ||  30m 42s   | 

|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-10-28 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12769215/YARN-2902.10.patch |
| JIRA Issue | YARN-2902 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  cc  |
| uname | Linux 0bcb1b5c92b2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/apache-yetus-06655ab/dev-support/personality/hadoop.sh
 |
| git revision | trunk / a04b169 |
| Default Java | 1.7.0_79 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_60 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_79 |
| findbugs | v3.0.0 |
| JDK v1.7.0_79  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9593/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Max memory used | 227MB |
| Powered by | Apache Yetus   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9593/console |




  Finished build.


{panel}

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.08.patch, 

[jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978309#comment-14978309
 ] 

Hudson commented on YARN-4251:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8718 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8718/])
YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
* hadoop-yarn-project/CHANGES.txt


> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   at 

[jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978394#comment-14978394
 ] 

Hudson commented on YARN-4251:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2537 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2537/])
YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java


> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   at 

[jira] [Updated] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

2015-10-28 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3862:
---
Attachment: YARN-3862-YARN-2928.wip.02.patch

> Decide which contents to retrieve and send back in response in TimelineReader
> -
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978240#comment-14978240
 ] 

Hudson commented on YARN-4251:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #593 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/593/])
YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java


> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   

[jira] [Commented] (YARN-4175) Example of use YARN-1197

2015-10-28 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978444#comment-14978444
 ] 

MENG DING commented on YARN-4175:
-

Correct a typo in the previous post. It should be {{app_id}} instead of 
{{application_id}}

\\
* once the application has started, user can start a new client and specify the 
*appmaster* option to set the client to the appmaster mode. Under this mode, 
the client will talk directly with appmaster, and user can specify *app_id*, 
*container_id*, *action*, *container_memory*, *container_vcores* options to 
request container resizing. For example, to change a container resource, the 
user can do:
{code}
hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -appmaster 
-app_id= -container_id= -action=CHANGE_CONTAINER 
-container_memory=2048 -container_vcores=1
{code}

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
> Attachments: YARN-4175.1.patch, YARN-4175.2.patch
>
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978330#comment-14978330
 ] 

Hudson commented on YARN-4251:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2483 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2483/])
YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java


> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   at 

[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-28 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978430#comment-14978430
 ] 

MENG DING commented on YARN-1509:
-

The failed tests are not related.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch, YARN-1509.6.patch, YARN-1509.7.patch, 
> YARN-1509.8.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2015-10-28 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4307:

Attachment: yarn-capacity-scheduler-debug.log

> Blacklisted nodes for AM container is not getting displayed in the Web UI
> -
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2015-10-28 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978579#comment-14978579
 ] 

Naganarasimha G R commented on YARN-4307:
-

Additionally i feel {{Dump scheduler logs}} in the scheduler page should show 
more informative alert like where the logs will be found and also for the first 
time users it would be clear to inform like logs will be completely available 
after what period of time. thoughts ?

> Blacklisted nodes for AM container is not getting displayed in the Web UI
> -
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2015-10-28 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-4307:
---

 Summary: Blacklisted nodes for AM container is not getting 
displayed in the Web UI
 Key: YARN-4307
 URL: https://issues.apache.org/jira/browse/YARN-4307
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R


In pseudo cluster had 2 NM's  and had launched app with incorrect configuration 
*./hadoop org.apache.hadoop.mapreduce.SleepJob 
-Dmapreduce.job.node-label-expression=labelX  
-Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
First attempt failed and 2nd attempt was launched, but the application was 
hung. In the scheduler logs found that localhost was blacklisted but in the UI 
(app& apps listing page) count was shown as zero and as well no hosts listed in 
the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978502#comment-14978502
 ] 

Hudson commented on YARN-4251:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1330 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1330/])
YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
* hadoop-yarn-project/CHANGES.txt


> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   at 

[jira] [Updated] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2015-10-28 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4307:

Attachment: webpage.png

> Blacklisted nodes for AM container is not getting displayed in the Web UI
> -
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.

2015-10-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4288:
-
Attachment: YARN-4288-v3.patch

Fix the whitespace and findbug issues in v3. The test failure is not related 
and already be tracked in HADOOP-11636.

> NodeManager restart should keep retrying to register to RM while connection 
> exception happens during RM failed over.
> 
>
> Key: YARN-4288
> URL: https://issues.apache.org/jira/browse/YARN-4288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch
>
>
> When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with 
> RPC which could throw following exceptions when RM get restarted at the same 
> time, like following exception shows:
> {noformat}
> 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl 
> (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - 
> Unexpected error rebooting NodeStatusUpdater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: "172.27.62.28"; 
> destination host is: "172.27.62.57":8025;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1473)
> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
> 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager 
> (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
> Failed on local exception: java.io.IOException: Connection reset by peer; 
> Host Details : local host is: "172.27.62.28"; destination host is: 
> "172.27.62.57":8025;
> at 
> 

[jira] [Commented] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2015-10-28 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978564#comment-14978564
 ] 

Naganarasimha G R commented on YARN-4307:
-

>From the preliminary check, i presume its because 
>{{SchedulerApplicationAttempt.getBlacklistedNodes()}} => 
>{{appSchedulingInfo.getBlackListCopy()}} => {{return new 
>HashSet<>(this.userBlacklist)}} but not *amBlacklist*. 
Not sure whether when {{appSchedulingInfo.amBlacklist}} gets updated we need to 
update {{userBlacklist}} also. thoughts ?
cc/ [~vvasudev], 

> Blacklisted nodes for AM container is not getting displayed in the Web UI
> -
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978478#comment-14978478
 ] 

Hudson commented on YARN-4251:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #607 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/607/])
YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
* hadoop-yarn-project/CHANGES.txt


> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   at 

[jira] [Updated] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4251:
-
Target Version/s: 2.8.0
Hadoop Flags: Reviewed
 Component/s: test

> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   at org.apache.hadoop.ipc.Server.(Server.java:2399)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:946)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:537)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
>   at 

[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-28 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2902:
---
Attachment: YARN-2902.10.patch

Findbugs and test failures are related. Attaching another patch

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.09.patch, 
> YARN-2902.10.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977953#comment-14977953
 ] 

Tsuyoshi Ozawa commented on YARN-4251:
--

+1 about this patch. Checking this in.

> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   at org.apache.hadoop.ipc.Server.(Server.java:2399)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:946)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:537)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
>   at 
> 

[jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978117#comment-14978117
 ] 

Brahma Reddy Battula commented on YARN-4251:


[~ozawa] thanks a lot for committing and reviewing this issue..

> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   at org.apache.hadoop.ipc.Server.(Server.java:2399)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:946)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:537)
>   at 
> 

[jira] [Created] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats

2015-10-28 Thread Sunil G (JIRA)
Sunil G created YARN-4308:
-

 Summary: ContainersAggregated CPU resource utilization reports 
negative usage in first few heartbeats
 Key: YARN-4308
 URL: https://issues.apache.org/jira/browse/YARN-4308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G


NodeManager reports ContainerAggregated CPU resource utilization as -ve value 
in first few heartbeats cycles. I added a new debug print and received below 
values from heartbeats.
{noformat}
INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 
INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
 Utilization :  CpuTrackerUsagePercent : 198.94598
{noformat}

Its better we send 0 as CPU usage rather than sending a negative values in 
heartbeats eventhough its happening in only first few heartbeats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats

2015-10-28 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978678#comment-14978678
 ] 

Sunil G commented on YARN-4308:
---

In {{CpuTimeTrackerCpuTimeTracker}}, Below code snippet can send 
{{UNAVAILABLE}} for first time as {{lastSampleTime}} will be {{UNAVAILABLE}} 
for first time calculation.

This caused for a -ve return value and caused NM to send a -ve CPU usages in 
its heartbeats for a brief time

{code}
  public float getCpuTrackerUsagePercent() {
if (lastSampleTime == UNAVAILABLE ||
lastSampleTime > sampleTime) {
  // lastSampleTime > sampleTime may happen when the system time is changed
  lastSampleTime = sampleTime;
  lastCumulativeCpuTime = cumulativeCpuTime;
  return cpuUsage;
}


{code}

> ContainersAggregated CPU resource utilization reports negative usage in first 
> few heartbeats
> 
>
> Key: YARN-4308
> URL: https://issues.apache.org/jira/browse/YARN-4308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
>
> NodeManager reports ContainerAggregated CPU resource utilization as -ve value 
> in first few heartbeats cycles. I added a new debug print and received below 
> values from heartbeats.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
>  Utilization :  CpuTrackerUsagePercent : 198.94598
> {noformat}
> Its better we send 0 as CPU usage rather than sending a negative values in 
> heartbeats eventhough its happening in only first few heartbeats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats

2015-10-28 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4308:
--
Attachment: 0001-YARN-4308.patch

Attaching a patch to handle this corner case.

> ContainersAggregated CPU resource utilization reports negative usage in first 
> few heartbeats
> 
>
> Key: YARN-4308
> URL: https://issues.apache.org/jira/browse/YARN-4308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4308.patch
>
>
> NodeManager reports ContainerAggregated CPU resource utilization as -ve value 
> in first few heartbeats cycles. I added a new debug print and received below 
> values from heartbeats.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
>  Utilization :  CpuTrackerUsagePercent : 198.94598
> {noformat}
> Its better we send 0 as CPU usage rather than sending a negative values in 
> heartbeats eventhough its happening in only first few heartbeats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978818#comment-14978818
 ] 

Xuan Gong commented on YARN-2859:
-

+1 LGTM. Checking this in

> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-10-28 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978867#comment-14978867
 ] 

Varun Vasudev commented on YARN-4309:
-

Yeah, tar-ing up the local dir may not be the best idea. However dumping the 
directory contents(file listings) and the contents of launch_container.sh 
should be ok?

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979001#comment-14979001
 ] 

Hudson commented on YARN-2859:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #596 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/596/])
YARN-2859. ApplicationHistoryServer binds to default port 8188 in (xgong: rev 
27414dac66f278b61fc23762204b01a1c508178a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-10-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978914#comment-14978914
 ] 

Jason Lowe commented on YARN-4309:
--

bq.  It's definitely private-to-the-user data though and full of information 
leaks.
Yeah, agree with Allen that it's dicey to publish everything there.  For 
example, the security tokens for the container are stored in one of the local 
container files, and we do not want that stored in HDFS and accessible by the 
jobhistoryserver user nor the ATS user.  The nodemanager goes out of its way, 
via the container-executor, to make sure user-private files are not visible 
even to the nodemanager user.

The launch script should be OK and is really the most valuable thing there for 
debugging startup failures.  Almost everything in that script is derived from 
what's in the configs, and the configs are already stored in HDFS or the ATS.

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-10-28 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978794#comment-14978794
 ] 

Sunil G commented on YARN-4290:
---

Proposed CLI o/p:

{noformat}
root@sunil-Inspiron-3543:/opt/hadoop/trunk/hadoop-3.0.0-SNAPSHOT/bin# ./yarn 
node -list -showDetails
15/10/28 22:33:11 INFO client.RMProxy: Connecting to ResourceManager at 
/127.0.0.1:25001
Total Nodes:1
 Node-Id Node-State Node-Http-Address   
Number-of-Running-Containers
 localhost:25006RUNNING   localhost:25008   
   0
 Detailed Node Information : 
Memory-Used(Capacity) : 0MB (8192MB)
CPU-Used(Capacity) : 0 vcores (8 vcores)
Node-ResourceUtilization : 
Containers-ResourceUtilization : 
Node-Labels :
{noformat}

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4309) Add debug information to application logs when a container fails

2015-10-28 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-4309:
---

 Summary: Add debug information to application logs when a 
container fails
 Key: YARN-4309
 URL: https://issues.apache.org/jira/browse/YARN-4309
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev


Sometimes when a container fails, it can be pretty hard to figure out why it 
failed.

My proposal is that if a container fails, we collect information about the 
container local dir and dump it into the container log dir. Ideally, I'd like 
to tar up the directory entirely, but I'm not sure of the security and space 
implications of such a approach. At the very least, we can list all the files 
in the container local dir, and dump the contents of launch_container.sh(into 
the container log dir).

When log aggregation occurs, all this information will automatically get 
collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-10-28 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978846#comment-14978846
 ] 

Naganarasimha G R commented on YARN-4290:
-

+1 good set of details !

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-10-28 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978886#comment-14978886
 ] 

Allen Wittenauer commented on YARN-4309:


As long as JH and/or TS respects the job ACLs  when it comes to sharing the 
data and sets permissions on the files in the container log appropriately, 
nothing really sticks out.  It's definitely private-to-the-user data though and 
full of information leaks.

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978899#comment-14978899
 ] 

Xuan Gong commented on YARN-2859:
-

Committed into trunk/branch-2/branch-2.7/branch-2.6.

Thanks, vinod

> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-10-28 Thread Brook Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978999#comment-14978999
 ] 

Brook Zhou commented on YARN-3223:
--

Thanks [~leftnoteasy],  [~djp] for review.

bq. Suggest to use CapacityScheduler#updateNodeAndQueueResource to update 
resources, we need to update queue's resource, cluster metrics as well.
That makes sense. I'm currently setting SchedulerNode's usedResource to equal 
to totalResource, and keeping totalResource the same. If we use that function, 
it means totalResource should be set equal to usedResource, and on recommission 
we should just revert back to the original totalResource? I like your way 
better.

bq. When async scheduling enabled, we need to make sure decommissioing node's 
total resource is updated so no new container will be allocated on these nodes.
Even if async scheduling is enabled, we will update the total resource on 
NODE_UPDATE event to equal to current usedResource, async scheduling thread 
will not allocate containers to the node.

bq.  RMNode itself (RMNode.getState()) is already include the necessary info, 
so the boolean parameter sounds like redundant
Agreed. I will let the scheduler decide the current state directly using that 
function.

bq.  I think we need separated test case to cover resource update during NM 
decommissioning 
Yes, that is definitely going to be added. I just wanted to see if my general 
ideas were okay with the community. Thanks!


> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2015-10-28 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978771#comment-14978771
 ] 

Varun Vasudev commented on YARN-4307:
-

So the blacklisted nodes in the UI refers to nodes blacklisted by the AM. My 
understanding is you also want to display nodes blacklisted by the RM for the 
AM. Correct? If so, we should display them as two separate columns. We should 
rename the existing column to "App blacklisted nodes" or something that 
indicates that it shows the nodes blacklisted by the app and add a new column 
to display the information you want.

> Blacklisted nodes for AM container is not getting displayed in the Web UI
> -
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978925#comment-14978925
 ] 

Hudson commented on YARN-2859:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8720 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8720/])
YARN-2859. ApplicationHistoryServer binds to default port 8188 in (xgong: rev 
27414dac66f278b61fc23762204b01a1c508178a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-10-28 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978860#comment-14978860
 ] 

Allen Wittenauer commented on YARN-4309:


We had a user who accidentally dist cached one of the base ETL data dirs... 
about 60GB worth... and then the job would fail.  So now you're looking at 60GB 
* Y containers of data to collect before even getting to the other stuff that's 
relevant.

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-10-28 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979133#comment-14979133
 ] 

Jonathan Eagles commented on YARN-4183:
---

[~xgong], I haven't heard back from you regarding this patch. Unless you have a 
strong opinion regarding this patch, I think it's ready to go in. Please chime 
in if you have thoughts or alternatives.

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4251) TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is failing

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979053#comment-14979053
 ] 

Hudson commented on YARN-4251:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #546 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/546/])
YARN-4251. (ozawa: rev 9f4dfdf4eb60cc6b13da586dabcd95bd77fc783c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java


> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> -
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>   at java.lang.reflect.Constructor.newInstance(Unknown Source)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:646)
>   at 

[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979080#comment-14979080
 ] 

Hudson commented on YARN-2859:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #609 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/609/])
YARN-2859. ApplicationHistoryServer binds to default port 8188 in (xgong: rev 
27414dac66f278b61fc23762204b01a1c508178a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979099#comment-14979099
 ] 

Hudson commented on YARN-2859:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1332 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1332/])
YARN-2859. ApplicationHistoryServer binds to default port 8188 in (xgong: rev 
27414dac66f278b61fc23762204b01a1c508178a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* hadoop-yarn-project/CHANGES.txt


> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-10-28 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979130#comment-14979130
 ] 

Jonathan Eagles commented on YARN-4183:
---

+1. [~xgong], unless you have any strong feeling against this patch, I'll 
commit this tomorrow.

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2859:
--
Target Version/s: 2.8.0, 2.7.2  (was: 2.8.0, 2.7.2, 2.6.2)
   Fix Version/s: (was: 2.6.2)

This did not make 2.6.2 as its RC was already cut. I'll set it to 2.6.3 once 
that version is created. I'll also recreate the CHANGES.txt entry under 2.6.3 
at that time.

> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4310) FairScheduler: Log skipping reservation messages at DEBUG level

2015-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979780#comment-14979780
 ] 

Hadoop QA commented on YARN-4310:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 9s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 22s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 130m 59s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-10-29 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12769398/YARN-4310.2.patch |
| JIRA Issue | YARN-4310 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux d52d52a09c1a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 

[jira] [Commented] (YARN-4310) FairScheduler: Log skipping reservation messages at DEBUG level

2015-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979653#comment-14979653
 ] 

Hadoop QA commented on YARN-4310:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 6s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 18s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 50s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-10-28 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12769398/YARN-4310.2.patch |
| JIRA Issue | YARN-4310 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux d3cfa57cdf8e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 

[jira] [Commented] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2015-10-28 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979752#comment-14979752
 ] 

Naganarasimha G R commented on YARN-4307:
-

Thanks for the comments [~varun vasudev],
Yes my intention is to show the black listed nodes irrespective of who has 
identified it. Also variable names its misleading : {{amBlacklist}} is actually 
the nodes blacklisted by RM for AM and {{userBlacklist}} is the nodes black 
listed by AM(user). 
Currently apps list , app  and appattempt pages we are showing count of black 
listed nodes, in my opinion we need to sum up  from both the list and show 
right ?


> Blacklisted nodes for AM container is not getting displayed in the Web UI
> -
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4313) Race condition in MiniMRYarnCluster when getting history server address

2015-10-28 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979640#comment-14979640
 ] 

Li Lu commented on YARN-4313:
-

Thanks [~jianhe] for the work. I have one quick w.r.t the atomic boolean in the 
patch. If we're just using atomic read/write for the boolean flag, I think a 
volatile boolean would suffice. We're not actually using the "fancy" features 
of atomic boolean in this fix. The other part of the patch LGTM. 

> Race condition in MiniMRYarnCluster when getting history server address
> ---
>
> Key: YARN-4313
> URL: https://issues.apache.org/jira/browse/YARN-4313
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4313.1.patch
>
>
> Problem in this place when waiting for JHS to be started
> {code}
> new Thread() {
>   public void run() {
> historyServer.start();
>   };
> }.start();
> while (historyServer.getServiceState() == STATE.INITED) {
>   LOG.info("Waiting for HistoryServer to start...");
>   Thread.sleep(1500);
> }
> {code}
> The service state is updated before the service is actually started. See 
> AbstractServic#start.  So it's possible that when the while loop breaks, the 
> service is not yet started. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4301) NM disk health checker should have a timeout

2015-10-28 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979812#comment-14979812
 ] 

Akihiro Suda commented on YARN-4301:


Here is the reproduction script: 
https://github.com/osrg/earthquake/tree/1ceab663baec2b93ee7309b7369ba4f9dcf3a2c2/example/yarn/4301-reproduce


I'll submit a patch to fix the bug later.


> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4310) Reduce log level for certain messages to prevent overrunning log file

2015-10-28 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-4310:
-

 Summary: Reduce log level for certain messages to prevent 
overrunning log file
 Key: YARN-4310
 URL: https://issues.apache.org/jira/browse/YARN-4310
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Arun Suresh
Assignee: Arun Suresh
Priority: Minor


YARN-4270 introduced an additional log message that is currently at INFO level :
{noformat}
2015-10-28 11:13:21,692 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttemt:
Reservation Exceeds Allowed number of nodes:
app_id=application_1446045371769_0001 existingReservations=1
totalAvailableNodes=4 reservableNodesRatio=0.05
numAllowedReservations=1
{noformat}
It has been observed that the log message can totally swap the RM log file. 
This needs to be reduced to DEBUG level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4184) Remove update reservation state api from state store as its not used by ReservationSystem

2015-10-28 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4184:
-
Parent Issue: YARN-2572  (was: YARN-2573)

> Remove update reservation state api from state store as its not used by 
> ReservationSystem
> -
>
> Key: YARN-4184
> URL: https://issues.apache.org/jira/browse/YARN-4184
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Subru Krishnan
>
> ReservationSystem uses remove/add for updates and thus update api in state 
> store is not needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979288#comment-14979288
 ] 

Jason Lowe commented on YARN-2902:
--

Thanks for updating the patch, Varun!  Patch looks great except for one thing 
that I missed earlier.  In container-executor.c the code is assuming any error 
returned by stat must mean the file is missing, and that's not correct.  It 
should be checking errno for ENOENT, as anything else is an error not related 
to the file being already deleted.


> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.09.patch, 
> YARN-2902.10.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4310) FairScheduler: Log skipping reservation messages at DEBUG level

2015-10-28 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979195#comment-14979195
 ] 

Karthik Kambatla commented on YARN-4310:


Can we add a guard to check debug logging is enabled? 

> FairScheduler: Log skipping reservation messages at DEBUG level
> ---
>
> Key: YARN-4310
> URL: https://issues.apache.org/jira/browse/YARN-4310
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Minor
> Attachments: YARN-4310.1.patch
>
>
> YARN-4270 introduced an additional log message that is currently at INFO 
> level :
> {noformat}
> 2015-10-28 11:13:21,692 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttemt:
> Reservation Exceeds Allowed number of nodes:
> app_id=application_1446045371769_0001 existingReservations=1
> totalAvailableNodes=4 reservableNodesRatio=0.05
> numAllowedReservations=1
> {noformat}
> It has been observed that the log message can totally swap the RM log file. 
> This needs to be reduced to DEBUG level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4310) Reduce log level for certain messages to prevent overrunning log file

2015-10-28 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4310:
--
Attachment: YARN-4310.1.patch

Attaching trivial patch

> Reduce log level for certain messages to prevent overrunning log file
> -
>
> Key: YARN-4310
> URL: https://issues.apache.org/jira/browse/YARN-4310
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Minor
> Attachments: YARN-4310.1.patch
>
>
> YARN-4270 introduced an additional log message that is currently at INFO 
> level :
> {noformat}
> 2015-10-28 11:13:21,692 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttemt:
> Reservation Exceeds Allowed number of nodes:
> app_id=application_1446045371769_0001 existingReservations=1
> totalAvailableNodes=4 reservableNodesRatio=0.05
> numAllowedReservations=1
> {noformat}
> It has been observed that the log message can totally swap the RM log file. 
> This needs to be reduced to DEBUG level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4310) FairScheduler: Log skipping reservation messages at DEBUG level

2015-10-28 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4310:
---
Summary: FairScheduler: Log skipping reservation messages at DEBUG level  
(was: Reduce log level for certain messages to prevent overrunning log file)

> FairScheduler: Log skipping reservation messages at DEBUG level
> ---
>
> Key: YARN-4310
> URL: https://issues.apache.org/jira/browse/YARN-4310
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Minor
> Attachments: YARN-4310.1.patch
>
>
> YARN-4270 introduced an additional log message that is currently at INFO 
> level :
> {noformat}
> 2015-10-28 11:13:21,692 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttemt:
> Reservation Exceeds Allowed number of nodes:
> app_id=application_1446045371769_0001 existingReservations=1
> totalAvailableNodes=4 reservableNodesRatio=0.05
> numAllowedReservations=1
> {noformat}
> It has been observed that the log message can totally swap the RM log file. 
> This needs to be reduced to DEBUG level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.

2015-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979205#comment-14979205
 ] 

Hadoop QA commented on YARN-4288:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
0s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 14s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk cannot run convertXmlToText from findbugs {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 57s 
{color} | {color:red} Patch generated 1 new checkstyle issues in root (total 
was 44, now 45). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 57s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_60. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 47s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 18s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 22s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-10-28 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12769294/YARN-4288-v3.patch |
| JIRA Issue | YARN-4288 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux a5588f4967ed 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 

[jira] [Updated] (YARN-4310) FairScheduler: Log skipping reservation messages at DEBUG level

2015-10-28 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4310:
--
Attachment: YARN-4310.2.patch

[~kasha] thanks for the review... updating patch

> FairScheduler: Log skipping reservation messages at DEBUG level
> ---
>
> Key: YARN-4310
> URL: https://issues.apache.org/jira/browse/YARN-4310
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Minor
> Attachments: YARN-4310.1.patch, YARN-4310.2.patch
>
>
> YARN-4270 introduced an additional log message that is currently at INFO 
> level :
> {noformat}
> 2015-10-28 11:13:21,692 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttemt:
> Reservation Exceeds Allowed number of nodes:
> app_id=application_1446045371769_0001 existingReservations=1
> totalAvailableNodes=4 reservableNodesRatio=0.05
> numAllowedReservations=1
> {noformat}
> It has been observed that the log message can totally swap the RM log file. 
> This needs to be reduced to DEBUG level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2015-10-28 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-4311:
-

 Summary: Removing nodes from include and exclude lists will not 
remove them from decommissioned nodes list
 Key: YARN-4311
 URL: https://issues.apache.org/jira/browse/YARN-4311
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.1
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


In order to fully forget about a node, removing the node from include and 
exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
tricky part that [~jlowe] pointed out was the case when include lists are not 
used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2573) Integrate ReservationSystem with the RM failover mechanism

2015-10-28 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2573:
-
Hadoop Flags:   (was: Reviewed)

> Integrate ReservationSystem with the RM failover mechanism
> --
>
> Key: YARN-2573
> URL: https://issues.apache.org/jira/browse/YARN-2573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.8.0
>
> Attachments: Design for Reservation HA.pdf
>
>
> YARN-1051 introduces the ReservationSystem and the current implementation is 
> completely in-memory based. YARN-149 brings in the notion of RM HA with a 
> highly available state store. This JIRA proposes persisting the Plan into the 
> RMStateStore and recovering it post RM failover



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2573) Integrate ReservationSystem with the RM failover mechanism

2015-10-28 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan resolved YARN-2573.
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0

All sub-tasks have been reviewed and committed to trunk and branch-2.

> Integrate ReservationSystem with the RM failover mechanism
> --
>
> Key: YARN-2573
> URL: https://issues.apache.org/jira/browse/YARN-2573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.8.0
>
> Attachments: Design for Reservation HA.pdf
>
>
> YARN-1051 introduces the ReservationSystem and the current implementation is 
> completely in-memory based. YARN-149 brings in the notion of RM HA with a 
> highly available state store. This JIRA proposes persisting the Plan into the 
> RMStateStore and recovering it post RM failover



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979239#comment-14979239
 ] 

Hudson commented on YARN-2859:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2539 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2539/])
YARN-2859. ApplicationHistoryServer binds to default port 8188 in (xgong: rev 
27414dac66f278b61fc23762204b01a1c508178a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* hadoop-yarn-project/CHANGES.txt


> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-28 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4312:
---
Description: 
These timeouts happen because we do ZK sync operation on RM startup after 
YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
for a couple of tests in TestSubmitApplicationWithRMHA.

{noformat}
testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
  Time elapsed: 5.162 sec  <<< ERROR!
java.lang.Exception: test timed out after 5000 milliseconds
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)



testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
  Time elapsed: 5.146 sec  <<< ERROR!
java.lang.Exception: test timed out after 5000 milliseconds
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 

[jira] [Commented] (YARN-4310) FairScheduler: Log skipping reservation messages at DEBUG level

2015-10-28 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979334#comment-14979334
 ] 

Karthik Kambatla commented on YARN-4310:


+1, pending Jenkins. 

> FairScheduler: Log skipping reservation messages at DEBUG level
> ---
>
> Key: YARN-4310
> URL: https://issues.apache.org/jira/browse/YARN-4310
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Minor
> Attachments: YARN-4310.1.patch, YARN-4310.2.patch
>
>
> YARN-4270 introduced an additional log message that is currently at INFO 
> level :
> {noformat}
> 2015-10-28 11:13:21,692 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttemt:
> Reservation Exceeds Allowed number of nodes:
> app_id=application_1446045371769_0001 existingReservations=1
> totalAvailableNodes=4 reservableNodesRatio=0.05
> numAllowedReservations=1
> {noformat}
> It has been observed that the log message can totally swap the RM log file. 
> This needs to be reduced to DEBUG level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4130) Duplicate declaration of ApplicationId in RMAppManager

2015-10-28 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979317#comment-14979317
 ] 

Kuhu Shukla commented on YARN-4130:
---

+1 (non-binding). 

> Duplicate declaration of ApplicationId in RMAppManager
> --
>
> Key: YARN-4130
> URL: https://issues.apache.org/jira/browse/YARN-4130
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
>  Labels: resourcemanager
> Attachments: YARN-4130.00.patch
>
>
> ApplicationId is declared double in {{RMAppManager}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-28 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4287:
-
Attachment: YARN-4287-minimal.patch

[~leftnoteasy], Another very simple approach is to just not reset 
schedulingOpportunities when we allocate a RACK_LOCAL container. This isn't 
quite as flexible, but might be fine for almost all use cases (Even today, we 
degrade into OFF_SWITCH at the same threshold or earlier, and will continue to 
schedule OFF_SWITCH without delay).
YARN-4287-minimal.patch does this.

Let me know your thoughts. 

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-28 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4312:
--

 Summary: TestSubmitApplicationWithRMHA fails on branch-2.7 and 
branch-2.6 as some of the test cases time out 
 Key: YARN-4312
 URL: https://issues.apache.org/jira/browse/YARN-4312
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1, 2.6.1
Reporter: Varun Saxena
Assignee: Varun Saxena


{noformat}
testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
  Time elapsed: 5.162 sec  <<< ERROR!
java.lang.Exception: test timed out after 5000 milliseconds
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)



testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
  Time elapsed: 5.146 sec  <<< ERROR!
java.lang.Exception: test timed out after 5000 milliseconds
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 

[jira] [Updated] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-28 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4312:
---
Attachment: YARN-4312-branch-2.7.01.patch

> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4312-branch-2.7.01.patch
>
>
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> 

[jira] [Commented] (YARN-4308) ContainersAggregated CPU resource utilization reports negative usage in first few heartbeats

2015-10-28 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979382#comment-14979382
 ] 

Karthik Kambatla commented on YARN-4308:


Don't recall all the details, but that was intentional. 

[~adhoot]/[~djp] - there was a lot of back and forth, do you guys remember the 
details? 

> ContainersAggregated CPU resource utilization reports negative usage in first 
> few heartbeats
> 
>
> Key: YARN-4308
> URL: https://issues.apache.org/jira/browse/YARN-4308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4308.patch
>
>
> NodeManager reports ContainerAggregated CPU resource utilization as -ve value 
> in first few heartbeats cycles. I added a new debug print and received below 
> values from heartbeats.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ContainersResource Utilization : CpuTrackerUsagePercent : -1.0 
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:ContainersResource
>  Utilization :  CpuTrackerUsagePercent : 198.94598
> {noformat}
> Its better we send 0 as CPU usage rather than sending a negative values in 
> heartbeats eventhough its happening in only first few heartbeats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-10-28 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979396#comment-14979396
 ] 

Karthik Kambatla commented on YARN-4309:


Generally, in favor of collecting debug information. launch_container.sh and 
directory listings/sizes should go a long way in our ability to debug launch 
failures. May be, collecting logs is okay for non-launch failures especially if 
the logs are not aggregated for some reason; would be nice to specify a limit 
on the size of these logs collected.

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-28 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979402#comment-14979402
 ] 

Varun Saxena commented on YARN-4312:


[~sjlee0], updated branch-2.7 and branch-2.6 patches.
Increased timeout to 50 seconds just to guard against timeout on a slow machine.

> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>

[jira] [Updated] (YARN-4312) TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of the test cases time out

2015-10-28 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4312:
---
Attachment: YARN-4312-branch-2.6.01.patch

> TestSubmitApplicationWithRMHA fails on branch-2.7 and branch-2.6 as some of 
> the test cases time out 
> 
>
> Key: YARN-4312
> URL: https://issues.apache.org/jira/browse/YARN-4312
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1, 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4312-branch-2.6.01.patch, 
> YARN-4312-branch-2.7.01.patch
>
>
> These timeouts happen because we do ZK sync operation on RM startup after 
> YARN-3798 which delays RM startup a bit making the timeouts of 5 s. too small 
> for a couple of tests in TestSubmitApplicationWithRMHA.
> {noformat}
> testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.162 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:191)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMHATestBase.startRMs(RMHATestBase.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testHandleRMHADuringSubmitApplicationCallWithSavedApplicationState(TestSubmitApplicationWithRMHA.java:234)
> 
> testHandleRMHADuringSubmitApplicationCallWithoutSavedApplicationState(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
>   Time elapsed: 5.146 sec  <<< ERROR!
> java.lang.Exception: test timed out after 5000 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.syncInternal(ZKRMStateStore.java:944)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:562)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:559)
>  

[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.

2015-10-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979454#comment-14979454
 ] 

Jian He commented on YARN-4288:
---

lgtm

> NodeManager restart should keep retrying to register to RM while connection 
> exception happens during RM failed over.
> 
>
> Key: YARN-4288
> URL: https://issues.apache.org/jira/browse/YARN-4288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch
>
>
> When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with 
> RPC which could throw following exceptions when RM get restarted at the same 
> time, like following exception shows:
> {noformat}
> 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl 
> (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - 
> Unexpected error rebooting NodeStatusUpdater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: "172.27.62.28"; 
> destination host is: "172.27.62.57":8025;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1473)
> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
> 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager 
> (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
> Failed on local exception: java.io.IOException: Connection reset by peer; 
> Host Details : local host is: "172.27.62.28"; destination host is: 
> "172.27.62.57":8025;
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:223)
> at 
> 

[jira] [Updated] (YARN-4271) Make the NodeManager's health checker service pluggable

2015-10-28 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4271:
-
Assignee: Raghav Mohan

> Make the NodeManager's health checker service pluggable
> ---
>
> Key: YARN-4271
> URL: https://issues.apache.org/jira/browse/YARN-4271
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Subru Krishnan
>Assignee: Raghav Mohan
>Priority: Minor
>
> This JIRA proposes making the NodeHealthCheckerService in the NM pluggable as 
> in cloud environments like Azure we want to tap into the provided health 
> checkers for disk and other service signal statuses. The idea is to extend 
> the existing NodeHealthCheckerService and hook in custom implementation to 
> evaluate if the node is healthy or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979464#comment-14979464
 ] 

Hudson commented on YARN-2859:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2485 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2485/])
YARN-2859. ApplicationHistoryServer binds to default port 8188 in (xgong: rev 
27414dac66f278b61fc23762204b01a1c508178a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4313) Race condition in MiniMRYarnCluster when getting history server address

2015-10-28 Thread Jian He (JIRA)
Jian He created YARN-4313:
-

 Summary: Race condition in MiniMRYarnCluster when getting history 
server address
 Key: YARN-4313
 URL: https://issues.apache.org/jira/browse/YARN-4313
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


Problem in this place when waiting for JHS to be started
{code}
new Thread() {
  public void run() {
historyServer.start();
  };
}.start();
while (historyServer.getServiceState() == STATE.INITED) {
  LOG.info("Waiting for HistoryServer to start...");
  Thread.sleep(1500);
}
{code}
The service state is updated before the service is actually started. See 
AbstractServic#start.  So it's possible that when the while loop breaks, the 
service is not yet started. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-10-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979548#comment-14979548
 ] 

Hudson commented on YARN-2859:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #547 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/547/])
YARN-2859. ApplicationHistoryServer binds to default port 8188 in (xgong: rev 
27414dac66f278b61fc23762204b01a1c508178a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.

2015-10-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979452#comment-14979452
 ] 

Junping Du commented on YARN-4288:
--

The findbug warning and checkstyle is not related.

> NodeManager restart should keep retrying to register to RM while connection 
> exception happens during RM failed over.
> 
>
> Key: YARN-4288
> URL: https://issues.apache.org/jira/browse/YARN-4288
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch
>
>
> When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with 
> RPC which could throw following exceptions when RM get restarted at the same 
> time, like following exception shows:
> {noformat}
> 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl 
> (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - 
> Unexpected error rebooting NodeStatusUpdater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: "172.27.62.28"; 
> destination host is: "172.27.62.57":8025;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1473)
> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
> 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager 
> (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
> Failed on local exception: java.io.IOException: Connection reset by peer; 
> Host Details : local host is: "172.27.62.28"; destination host is: 
> "172.27.62.57":8025;
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:223)
> at 

[jira] [Updated] (YARN-4313) Race condition in MiniMRYarnCluster when getting history server address

2015-10-28 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4313:
--
Attachment: YARN-4313.1.patch

Uploaded a patch 

> Race condition in MiniMRYarnCluster when getting history server address
> ---
>
> Key: YARN-4313
> URL: https://issues.apache.org/jira/browse/YARN-4313
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4313.1.patch
>
>
> Problem in this place when waiting for JHS to be started
> {code}
> new Thread() {
>   public void run() {
> historyServer.start();
>   };
> }.start();
> while (historyServer.getServiceState() == STATE.INITED) {
>   LOG.info("Waiting for HistoryServer to start...");
>   Thread.sleep(1500);
> }
> {code}
> The service state is updated before the service is actually started. See 
> AbstractServic#start.  So it's possible that when the while loop breaks, the 
> service is not yet started. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition need not be displayed properly in UI

2015-10-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979543#comment-14979543
 ] 

Wangda Tan commented on YARN-4304:
--

Thanks [~sunilg] opening this, +1 for doing this and addressing cluster metrics 
as well.

I would appreciate if you can take a look at other cluster/queue metrics to see 
if there's any other partition-related metrics need to be fixed. And could you 
update title of this JIRA?

> AM max resource configuration per partition need not be displayed properly in 
> UI
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI also need to display correct configurations related to 
> same. Current UI still shows am-resource percentage per queue level. This is 
> to be updated correctly when label config is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4246) NPE while listing app attempt

2015-10-28 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978231#comment-14978231
 ] 

nijel commented on YARN-4246:
-

thanks [~varun_saxena] and [~rohithsharma] for the review and commit.

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
> Fix For: 2.8.0
>
> Attachments: YARN-4246_1.patch, YARN-4246_2.patch
>
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-10-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979560#comment-14979560
 ] 

Wangda Tan commented on YARN-4287:
--

Hi [~nroberts],
bq. One argument for sticking with the scaling approach is the fact that we 
basically do it today in a simpler fashion. If you specify node-locality-delay 
of 5000 on a 3000 node cluster, it gets automatically scaled down to 3000 
without informing the user. So I'd say scale it but don't try to explain it in 
user documentation.
I still think scaling down is not a straightforward way to support the problem 
you mentioned (user isn't clear about size of cluster). Instead, I think we can 
use percentage. User can say, I want node locality delay to 300 OR 10% of 
cluster size. And same to rack locality delay. Scheduler will compute what's 
the actual delay at runtime. With this, I think we can safely cap delay by 
cluster size. Does this make sense to you?

Thanks,

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4287-minimal.patch, YARN-4287-v2.patch, 
> YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)