[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244034#comment-15244034
 ] 

Hadoop QA commented on YARN-4676:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 
0s {color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped branch modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 53s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
21s {color} | {color:green} root: patch generated 0 new + 519 unchanged - 6 
fixed = 519 total (was 525) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patch modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 9m 
29s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 11m 24s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95
 with JDK v1.7.0_95 generated 1 new + 2 unchanged - 0 

[jira] [Updated] (YARN-4965) Distributed shell AM failed due to ClientHandlerException thrown by jersey

2016-04-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4965:
-
Attachment: YARN-4965.patch

> Distributed shell AM failed due to ClientHandlerException thrown by jersey
> --
>
> Key: YARN-4965
> URL: https://issues.apache.org/jira/browse/YARN-4965
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4965.patch
>
>
> Distributed shell AM failed with RuntimeException: ClientHandlerException
> {code:title= app logs}
> Exception in thread "AMRM Callback Handler Thread" 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream 
> closed.
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:312)
> Caused by: com.sun.jersey.api.client.ClientHandlerException: 
> java.io.IOException: Stream closed.
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:563)
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:446)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerEndEvent(ApplicationMaster.java:1144)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$400(ApplicationMaster.java:169)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$RMCallbackHandler.onContainersCompleted(ApplicationMaster.java:779)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
> Caused by: java.io.IOException: Stream closed.
>   at 
> java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:458)
>   at java.net.SocketInputStream.available(SocketInputStream.java:245)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:342)
>   at 
> sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
>   at 
> sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
>   at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3067)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.ensureLoaded(ByteSourceBootstrapper.java:507)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.detectEncoding(ByteSourceBootstrapper.java:129)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.constructParser(ByteSourceBootstrapper.java:224)
>   at 
> org.codehaus.jackson.JsonFactory._createJsonParser(JsonFactory.java:785)
>   at 
> org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:561)
>   at 
> org.codehaus.jackson.jaxrs.JacksonJsonProvider.readFrom(JacksonJsonProvider.java:414)
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:553)
>   ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4965) Distributed shell AM failed due to ClientHandlerException thrown by jersey

2016-04-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4965:
-
Fix Version/s: (was: 2.8.0)

> Distributed shell AM failed due to ClientHandlerException thrown by jersey
> --
>
> Key: YARN-4965
> URL: https://issues.apache.org/jira/browse/YARN-4965
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4965.patch
>
>
> Distributed shell AM failed with RuntimeException: ClientHandlerException
> {code:title= app logs}
> Exception in thread "AMRM Callback Handler Thread" 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream 
> closed.
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:312)
> Caused by: com.sun.jersey.api.client.ClientHandlerException: 
> java.io.IOException: Stream closed.
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:563)
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:446)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerEndEvent(ApplicationMaster.java:1144)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$400(ApplicationMaster.java:169)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$RMCallbackHandler.onContainersCompleted(ApplicationMaster.java:779)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
> Caused by: java.io.IOException: Stream closed.
>   at 
> java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:458)
>   at java.net.SocketInputStream.available(SocketInputStream.java:245)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:342)
>   at 
> sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
>   at 
> sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
>   at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3067)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.ensureLoaded(ByteSourceBootstrapper.java:507)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.detectEncoding(ByteSourceBootstrapper.java:129)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.constructParser(ByteSourceBootstrapper.java:224)
>   at 
> org.codehaus.jackson.JsonFactory._createJsonParser(JsonFactory.java:785)
>   at 
> org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:561)
>   at 
> org.codehaus.jackson.jaxrs.JacksonJsonProvider.readFrom(JacksonJsonProvider.java:414)
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:553)
>   ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4965) Distributed shell AM failed due to ClientHandlerException thrown by jersey

2016-04-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4965:
-
Target Version/s: 2.8.0

> Distributed shell AM failed due to ClientHandlerException thrown by jersey
> --
>
> Key: YARN-4965
> URL: https://issues.apache.org/jira/browse/YARN-4965
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4965.patch
>
>
> Distributed shell AM failed with RuntimeException: ClientHandlerException
> {code:title= app logs}
> Exception in thread "AMRM Callback Handler Thread" 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream 
> closed.
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:312)
> Caused by: com.sun.jersey.api.client.ClientHandlerException: 
> java.io.IOException: Stream closed.
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:563)
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:446)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerEndEvent(ApplicationMaster.java:1144)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$400(ApplicationMaster.java:169)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$RMCallbackHandler.onContainersCompleted(ApplicationMaster.java:779)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
> Caused by: java.io.IOException: Stream closed.
>   at 
> java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:458)
>   at java.net.SocketInputStream.available(SocketInputStream.java:245)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:342)
>   at 
> sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
>   at 
> sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
>   at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3067)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.ensureLoaded(ByteSourceBootstrapper.java:507)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.detectEncoding(ByteSourceBootstrapper.java:129)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.constructParser(ByteSourceBootstrapper.java:224)
>   at 
> org.codehaus.jackson.JsonFactory._createJsonParser(JsonFactory.java:785)
>   at 
> org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:561)
>   at 
> org.codehaus.jackson.jaxrs.JacksonJsonProvider.readFrom(JacksonJsonProvider.java:414)
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:553)
>   ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4965) Distributed shell AM failed due to ClientHandlerException thrown by jersey

2016-04-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4965:
-
Description: 
Distributed shell AM failed with RuntimeException: ClientHandlerException
{code:title= app logs}
Exception in thread "AMRM Callback Handler Thread" 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream 
closed.
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:312)
Caused by: com.sun.jersey.api.client.ClientHandlerException: 
java.io.IOException: Stream closed.
at 
com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:563)
at 
com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:506)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:446)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerEndEvent(ApplicationMaster.java:1144)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$400(ApplicationMaster.java:169)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$RMCallbackHandler.onContainersCompleted(ApplicationMaster.java:779)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
Caused by: java.io.IOException: Stream closed.
at 
java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:458)
at java.net.SocketInputStream.available(SocketInputStream.java:245)
at java.io.BufferedInputStream.read(BufferedInputStream.java:342)
at 
sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
at 
sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3067)
at 
org.codehaus.jackson.impl.ByteSourceBootstrapper.ensureLoaded(ByteSourceBootstrapper.java:507)
at 
org.codehaus.jackson.impl.ByteSourceBootstrapper.detectEncoding(ByteSourceBootstrapper.java:129)
at 
org.codehaus.jackson.impl.ByteSourceBootstrapper.constructParser(ByteSourceBootstrapper.java:224)
at 
org.codehaus.jackson.JsonFactory._createJsonParser(JsonFactory.java:785)
at 
org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:561)
at 
org.codehaus.jackson.jaxrs.JacksonJsonProvider.readFrom(JacksonJsonProvider.java:414)
at 
com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:553)
... 6 more
{code}


  was:
Distributed shell AM failed with java.io.IOException
{code:title= app logs}
Exception in thread "AMRM Callback Handler Thread" 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream 
closed.
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:312)
Caused by: com.sun.jersey.api.client.ClientHandlerException: 
java.io.IOException: Stream closed.
at 
com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:563)
at 
com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:506)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:446)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerEndEvent(ApplicationMaster.java:1144)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$400(ApplicationMaster.java:169)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$RMCallbackHandler.onContainersCompleted(ApplicationMaster.java:779)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
Caused by: java.io.IOException: Stream closed.
at 
java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:458)
at java.net.SocketInputStream.available(SocketInputStream.java:245)
at java.io.BufferedInputStream.read(BufferedInputStream.java:342)
at 
sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
at 
sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 

[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2016-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244015#comment-15244015
 ] 

Wangda Tan commented on YARN-3215:
--

[~Naganarasimha], it seems failures relate to the changes, could you take a 
look at them?

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3215.v1.001.patch, YARN-3215.v2.001.patch, 
> YARN-3215.v2.002.patch, YARN-3215.v2.003.patch, YARN-3215.v2.branch-2.8.patch
>
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4966) More improvement to get Container logs without specify nodeId

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244008#comment-15244008
 ] 

Hadoop QA commented on YARN-4966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 59s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 30 new + 
43 unchanged - 8 fixed = 73 total (was 51) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 17 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 22s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 9s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 8s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 25s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| 

[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243990#comment-15243990
 ] 

Hadoop QA commented on YARN-3215:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 30s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
0s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} branch-2.8 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
21s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} branch-2.8 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 217 unchanged - 6 fixed = 218 total (was 223) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 37s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 34s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 212m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
 |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | 

[jira] [Commented] (YARN-4934) Reserved Resource for QueueMetrics needs to be handled correctly in few cases

2016-04-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243971#comment-15243971
 ] 

Sunil G commented on YARN-4934:
---

YARN-4947 and YARN-4890 are tracking test failures.  

> Reserved Resource for QueueMetrics needs to be handled correctly in few cases 
> --
>
> Key: YARN-4934
> URL: https://issues.apache.org/jira/browse/YARN-4934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.9.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4934.patch
>
>
> Reseved Resource for QueueMetrics needs to be decremented correctly in cases 
> like below:
> - when a reserved container is allocated
> - when node is lost/ disconnected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4934) Reserved Resource for QueueMetrics needs to be handled correctly in few cases

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243969#comment-15243969
 ] 

Hadoop QA commented on YARN-4934:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 50s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 169m 28s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.8.0_77 Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| 

[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243959#comment-15243959
 ] 

Junping Du commented on YARN-4955:
--

I don't think it is a good idea to add inner/nested class just for unit test 
purpose - which make code less readable which is worse than no UT for a simple 
exception catch. If we must have unit test, I would suggest to go with v3 patch 
+ mock exception - that make our main code cleaner.

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch, YARN-4955.5.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at 

[jira] [Updated] (YARN-4957) Add getNewReservation in ApplicationClientProtocol

2016-04-15 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4957:
--
Attachment: YARN-4957.v0.patch

> Add getNewReservation in ApplicationClientProtocol
> --
>
> Key: YARN-4957
> URL: https://issues.apache.org/jira/browse/YARN-4957
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Subru Krishnan
>Assignee: Sean Po
> Attachments: YARN-4957.v0.patch
>
>
> Currently submitReservation returns a ReservationId if sucessful. This JIRA 
> propose adding a getNewReservation in ApplicationClientProtocol for the 
> following reasons:
>   * Prevent zombie reservations in the face of client and/or network failures 
> post submitReservation
>   * Align reservation submission with application submission



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2016-04-15 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243933#comment-15243933
 ] 

Subru Krishnan commented on YARN-4468:
--

Thanks [~asuresh] for reviewing/committing.

> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4468-v4-branch2.patch, YARN-4468.1.patch, 
> YARN-4468.rest-only.patch, YARN-4486-v3.patch, YARN-4486-v4.patch, 
> yarn_reservation_system.png
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2016-04-15 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4468:
-
Target Version/s:   (was: 2.8.0)

> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4468-v4-branch2.patch, YARN-4468.1.patch, 
> YARN-4468.rest-only.patch, YARN-4486-v3.patch, YARN-4486-v4.patch, 
> yarn_reservation_system.png
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2016-04-15 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4468:
-
Fix Version/s: 2.8.0

> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4468-v4-branch2.patch, YARN-4468.1.patch, 
> YARN-4468.rest-only.patch, YARN-4486-v3.patch, YARN-4486-v4.patch, 
> yarn_reservation_system.png
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243917#comment-15243917
 ] 

Vinod Kumar Vavilapalli commented on YARN-4955:
---

I think I understand what you guys are saying.

[~xgong], you can do this by creating an inner class inside 
TimelineClientRetryOp say {{TimelineClientRetryOpForOperateDelegationToken}} 
(instead of {{createTimelineClientRetryOpForOperateDelegationToken()}}) and 
then override *only* the run() method of this class inside the test-case to 
throw a SocketTimeoutException.

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch, YARN-4955.5.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> 

[jira] [Updated] (YARN-4966) More improvement to get Container logs without specify nodeId

2016-04-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4966:

Attachment: YARN-4966.1.patch

> More improvement to get Container logs without specify nodeId
> -
>
> Key: YARN-4966
> URL: https://issues.apache.org/jira/browse/YARN-4966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4966.1.patch
>
>
> Currently, for the finished application, we can get the container logs 
> without specify node id, but we need to enable 
> yarn.timeline-service.generic-application-history.enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4966) More improvement to get Container logs without specify nodeId

2016-04-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243910#comment-15243910
 ] 

Xuan Gong commented on YARN-4966:
-

what we could improve is if 
yarn.timeline-service.generic-application-history.enabled is disabled, for the 
finished application, we could read through all the log files in HDFS and try 
to find the specific container log.

> More improvement to get Container logs without specify nodeId
> -
>
> Key: YARN-4966
> URL: https://issues.apache.org/jira/browse/YARN-4966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> Currently, for the finished application, we can get the container logs 
> without specify node id, but we need to enable 
> yarn.timeline-service.generic-application-history.enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4966) More improvement to get Container logs without specify nodeId

2016-04-15 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-4966:
---

 Summary: More improvement to get Container logs without specify 
nodeId
 Key: YARN-4966
 URL: https://issues.apache.org/jira/browse/YARN-4966
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong


Currently, for the finished application, we can get the container logs without 
specify node id, but we need to enable 
yarn.timeline-service.generic-application-history.enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243903#comment-15243903
 ] 

Xuan Gong commented on YARN-4955:
-

[~gtCarrera9]

in order to test retry on SocketTimeoutException, I have to override 
TimelineClientRetryOp#run() to throw out SocketTimeoutException. 

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch, YARN-4955.5.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> 

[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243894#comment-15243894
 ] 

Li Lu commented on YARN-4955:
-

Sorry I didn't get back earlier. One confusion, are we testing on the behavior 
of {{clientFake}}'s retry instead of client's retry here? 

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch, YARN-4955.5.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> 

[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243891#comment-15243891
 ] 

Vinod Kumar Vavilapalli commented on YARN-4955:
---

[~djp], I was reviewing the previous patch and gave some review comments, so 
I'll review this update and commit it if they are addressed, no need to rush.

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch, YARN-4955.5.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> 

[jira] [Commented] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243889#comment-15243889
 ] 

Hadoop QA commented on YARN-4514:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 49s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
44s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 4m 15s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:4bf84af |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12798975/YARN-4514-YARN-3368.8.patch
 |
| JIRA Issue | YARN-4514 |
| Optional Tests |  asflicense  |
| uname | Linux 3ccea15df82d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-3368 / 4bf84af |
| modules | C:  hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui   .  U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11108/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, 
> YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, 
> YARN-4514-YARN-3368.4.patch, YARN-4514-YARN-3368.5.patch, 
> YARN-4514-YARN-3368.6.patch, YARN-4514-YARN-3368.7.patch, 
> YARN-4514-YARN-3368.8.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2016-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243886#comment-15243886
 ] 

Hudson commented on YARN-4468:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9621 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9621/])
YARN-4468. Document the general ReservationSystem functionality, and the (arun 
suresh: rev cab9cbaa0a6d92dd6473545da0ea1e6a22fd09e1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ReservationSystem.md
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YARN.md
* hadoop-project/src/site/site.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/yarn_reservation_system.png


> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4468-v4-branch2.patch, YARN-4468.1.patch, 
> YARN-4468.rest-only.patch, YARN-4486-v3.patch, YARN-4486-v4.patch, 
> yarn_reservation_system.png
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2016-04-15 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4468:
-
Attachment: YARN-4468-v4-branch2.patch

Attaching a patch file rebased against branch-2.

> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4468-v4-branch2.patch, YARN-4468.1.patch, 
> YARN-4468.rest-only.patch, YARN-4486-v3.patch, YARN-4486-v4.patch, 
> yarn_reservation_system.png
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243880#comment-15243880
 ] 

Junping Du commented on YARN-4955:
--

The checkstyle issue seems to be noisy only. whitespace issue can be fixed in 
commit. Patch LGTM.
[~vinodkv] and [~gtCarrera9], do you have more comments? If not, I will go 
ahead to commit this...

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch, YARN-4955.5.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>  

[jira] [Updated] (YARN-4849) [YARN-3368] cleanup code base, integrate web UI related build to mvn, and fix licenses.

2016-04-15 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4849:
-
Attachment: YARN-4849-YARN-3368.8.patch

> [YARN-3368] cleanup code base, integrate web UI related build to mvn, and fix 
> licenses.
> ---
>
> Key: YARN-4849
> URL: https://issues.apache.org/jira/browse/YARN-4849
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: YARN-3368
>
> Attachments: YARN-4849-YARN-3368.1.patch, 
> YARN-4849-YARN-3368.2.patch, YARN-4849-YARN-3368.3.patch, 
> YARN-4849-YARN-3368.4.patch, YARN-4849-YARN-3368.5.patch, 
> YARN-4849-YARN-3368.6.patch, YARN-4849-YARN-3368.7.patch, 
> YARN-4849-YARN-3368.8.patch, YARN-4849-YARN-3368.addendum.1.patch, 
> YARN-4849-YARN-3368.addendum.2.patch, YARN-4849-YARN-3368.addendum.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4965) Distributed shell AM failed due to ClientHandlerException thrown by jersey

2016-04-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-4965:


Assignee: Junping Du

> Distributed shell AM failed due to ClientHandlerException thrown by jersey
> --
>
> Key: YARN-4965
> URL: https://issues.apache.org/jira/browse/YARN-4965
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
>
> Distributed shell AM failed with java.io.IOException
> {code:title= app logs}
> Exception in thread "AMRM Callback Handler Thread" 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream 
> closed.
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:312)
> Caused by: com.sun.jersey.api.client.ClientHandlerException: 
> java.io.IOException: Stream closed.
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:563)
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:446)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerEndEvent(ApplicationMaster.java:1144)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$400(ApplicationMaster.java:169)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$RMCallbackHandler.onContainersCompleted(ApplicationMaster.java:779)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
> Caused by: java.io.IOException: Stream closed.
>   at 
> java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:458)
>   at java.net.SocketInputStream.available(SocketInputStream.java:245)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:342)
>   at 
> sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
>   at 
> sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
>   at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3067)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.ensureLoaded(ByteSourceBootstrapper.java:507)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.detectEncoding(ByteSourceBootstrapper.java:129)
>   at 
> org.codehaus.jackson.impl.ByteSourceBootstrapper.constructParser(ByteSourceBootstrapper.java:224)
>   at 
> org.codehaus.jackson.JsonFactory._createJsonParser(JsonFactory.java:785)
>   at 
> org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:561)
>   at 
> org.codehaus.jackson.jaxrs.JacksonJsonProvider.readFrom(JacksonJsonProvider.java:414)
>   at 
> com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:553)
>   ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4965) Distributed shell AM failed due to ClientHandlerException thrown by jersey

2016-04-15 Thread Sumana Sathish (JIRA)
Sumana Sathish created YARN-4965:


 Summary: Distributed shell AM failed due to ClientHandlerException 
thrown by jersey
 Key: YARN-4965
 URL: https://issues.apache.org/jira/browse/YARN-4965
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.2
Reporter: Sumana Sathish
Priority: Critical
 Fix For: 2.8.0


Distributed shell AM failed with java.io.IOException
{code:title= app logs}
Exception in thread "AMRM Callback Handler Thread" 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Stream 
closed.
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:312)
Caused by: com.sun.jersey.api.client.ClientHandlerException: 
java.io.IOException: Stream closed.
at 
com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:563)
at 
com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:506)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:446)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerEndEvent(ApplicationMaster.java:1144)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$400(ApplicationMaster.java:169)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$RMCallbackHandler.onContainersCompleted(ApplicationMaster.java:779)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
Caused by: java.io.IOException: Stream closed.
at 
java.net.AbstractPlainSocketImpl.available(AbstractPlainSocketImpl.java:458)
at java.net.SocketInputStream.available(SocketInputStream.java:245)
at java.io.BufferedInputStream.read(BufferedInputStream.java:342)
at 
sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
at 
sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3067)
at 
org.codehaus.jackson.impl.ByteSourceBootstrapper.ensureLoaded(ByteSourceBootstrapper.java:507)
at 
org.codehaus.jackson.impl.ByteSourceBootstrapper.detectEncoding(ByteSourceBootstrapper.java:129)
at 
org.codehaus.jackson.impl.ByteSourceBootstrapper.constructParser(ByteSourceBootstrapper.java:224)
at 
org.codehaus.jackson.JsonFactory._createJsonParser(JsonFactory.java:785)
at 
org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:561)
at 
org.codehaus.jackson.jaxrs.JacksonJsonProvider.readFrom(JacksonJsonProvider.java:414)
at 
com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:553)
... 6 more
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2016-04-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243877#comment-15243877
 ] 

Arun Suresh commented on YARN-4468:
---

+1,
Committing this shortly

> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4468.1.patch, YARN-4468.rest-only.patch, 
> YARN-4486-v3.patch, YARN-4486-v4.patch, yarn_reservation_system.png
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2016-04-15 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4468:
-
Attachment: yarn_reservation_system.png

Attaching the reservation system architecture diagram

> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4468.1.patch, YARN-4468.rest-only.patch, 
> YARN-4486-v3.patch, YARN-4486-v4.patch, yarn_reservation_system.png
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-04-15 Thread Daniel Zhi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Zhi updated YARN-4676:
-
Attachment: YARN-4676.012.patch

Fixed TestPBImplRecords and TestRMAdminCLI (related to patch 011).

I don't expect all other QA tests failures are related and have ran about a 
dozen of them with 9 PASS and 3 FAIL locally with or without my patch.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, 
> YARN-4676.011.patch, YARN-4676.012.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-04-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243722#comment-15243722
 ] 

Arun Suresh commented on YARN-2883:
---

Looks like you have to add an additional 
{noformat}
if (!event instanceof ApplicationContainerFinishedEvent){
  throw RuntimeException("...")
}
{noformat}
prior casting it..
This is whats done in other parts of the code : 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1204

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-trunk.004.patch, YARN-2883-trunk.005.patch, 
> YARN-2883-trunk.006.patch, YARN-2883-trunk.007.patch, 
> YARN-2883-trunk.008.patch, YARN-2883-trunk.009.patch, 
> YARN-2883-trunk.010.patch, YARN-2883-trunk.011.patch, 
> YARN-2883-trunk.012.patch, YARN-2883-trunk.013.patch, 
> YARN-2883-yarn-2877.001.patch, YARN-2883-yarn-2877.002.patch, 
> YARN-2883-yarn-2877.003.patch, YARN-2883-yarn-2877.004.patch, 
> YARN-2883.013.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-15 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243662#comment-15243662
 ] 

Nathan Roberts commented on YARN-4963:
--

Thanks [~leftnoteasy] for the feedback. I agree that it would be a useful 
feature to be able to give some applications better spread, regardless of 
allocation type. Now we just have to figure out how to get there.

My concern is that I don't think we'd want to implement it using the same 
simple approach if it's going to apply to all container types. For example, in 
our case we almost always want NODE_LOCAL and RACK_LOCAL to get scheduled as 
quickly as possible so I'd want the limit to be high, as opposed to OFF_SWITCH 
where I want the limit to be 3-5 to keep a nice balance between scheduling 
performance and clustering. 

The reason this check was introduced in the first place (iirc) was to prevent 
network-heavy applications from loading up on specific nodes. The OFF_SWITCH 
check was a simple way of achieving this at a global level. The feature I think 
you're asking for (please correct me if I misunderstood) is that applications 
should be able to request that container spread be prioritized over timely 
scheduling (kind of like locality delay does today). I completely agree this 
would be a useful knob for applications to have. It is a trade-off though. An 
application that wants really good spread would be sacrificing scheduling 
opportunities that would probably be given to applications behind them in the 
queue (like locality delay).

So maybe there are two things to do:
1) Have the global OFF_SWITCH check to handle the simple case of avoiding too 
many network-heavy applications on a node. 
2) A feature where applications can specify a 
max_containers_assigned_per_node_per_heartbeat. I think this would be checked 
down in LeafQueue.assignContainers().

Even with #2 in place, I don't think #1 could immediately go away because the 
network-heavy applications would need to start properly specifying this limit.

The other approach to get rid of #1 would be when network is a resource. Such 
applications could then request lots of network resource, which should prevent 
clustering.

Does that make any sort of sense?


> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2016-04-15 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243651#comment-15243651
 ] 

Yufei Gu commented on YARN-2306:


Hi [~zhiguohong], Thank you for working on this. I am glad you are still 
working on it. Have you address all comments and the test failures from Hadoop 
QA?


> leak of reservation metrics (fair scheduler)
> 
>
> Key: YARN-2306
> URL: https://issues.apache.org/jira/browse/YARN-2306
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch
>
>
> This only applies to fair scheduler. Capacity scheduler is OK.
> When appAttempt or node is removed, the metrics for 
> reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
> back.
> These are important metrics for administrator. The wrong metrics confuses may 
> confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243636#comment-15243636
 ] 

Hadoop QA commented on YARN-4955:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: patch 
generated 1 new + 19 unchanged - 0 fixed = 20 total (was 19) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 58s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 12s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 34s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12799008/YARN-4955.5.patch |
| JIRA Issue | YARN-4955 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0553c18b369c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 

[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243627#comment-15243627
 ] 

Hudson commented on YARN-4940:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9620 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9620/])
YARN-4940. yarn node -list -all failed if RM start with decommissioned (jlowe: 
rev 69f3d428d5c3ab0c79cacffc22b1f59408622ae7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java


> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Fix For: 2.8.0, 2.7.3
>
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch, 
> YARN-4940.03.patch, YARN-4940.04.patch, YARN-4940.05.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> 

[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-04-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243588#comment-15243588
 ] 

Vinod Kumar Vavilapalli commented on YARN-4577:
---

bq. We simply do not pass in any configuration anywhere as part of the 
AuxService APIs - so this entire thread of reasoning about getClass() is no 
long a problem?
[~xgong] reminded me offline that we do pass a shared configuration as part of 
serviceInit(). In that case, the solution is simply to pass a private cloned 
Configuration for each of the aux-services?

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, 
> YARN-4577.3.rebase.patch, YARN-4577.4.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243545#comment-15243545
 ] 

Xuan Gong commented on YARN-4955:
-

Fixed the checkstyle and whitespace warning.

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch, YARN-4955.5.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> 

[jira] [Updated] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4955:

Attachment: YARN-4955.5.patch

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch, YARN-4955.5.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:567)
>

[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243530#comment-15243530
 ] 

Hadoop QA commented on YARN-4955:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: patch 
generated 2 new + 19 unchanged - 0 fixed = 21 total (was 19) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 58s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 13s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 57s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12799005/YARN-4955.4-1.patch |
| JIRA Issue | YARN-4955 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux fe2eac08a01c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / fdbafbc 

[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243500#comment-15243500
 ] 

Jason Lowe commented on YARN-4940:
--

+1 lgtm.  The test failures appear to be unrelated.

Committing this.

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch, 
> YARN-4940.03.patch, YARN-4940.04.patch, YARN-4940.05.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> 

[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243494#comment-15243494
 ] 

Xuan Gong commented on YARN-4955:
-

Thanks for the review.

New patch added a testcase for this.

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> 

[jira] [Updated] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4955:

Attachment: YARN-4955.4-1.patch

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4-1.patch, YARN-4955.4.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:567)
>   ... 24 more
> 

[jira] [Updated] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4955:

Attachment: YARN-4955.4.patch

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch, 
> YARN-4955.4.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:567)
>   ... 24 more
> Caused by: 

[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243400#comment-15243400
 ] 

Hudson commented on YARN-4909:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9619 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9619/])
YARN-4909. Fix intermittent failures of TestRMWebServices And 
(naganarasimha_gr: rev fdbafbc9e59314d9f9f75e615de9d2dfdced017b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/JerseyTestBase.java


> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch, 0005-YARN-4909.patch, 
> 0006-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4964) Allow ShuffleHandler readahead without drop-behind

2016-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243364#comment-15243364
 ] 

Wangda Tan commented on YARN-4964:
--

Should this be moved to MAPREDUCE project?

> Allow ShuffleHandler readahead without drop-behind
> --
>
> Key: YARN-4964
> URL: https://issues.apache.org/jira/browse/YARN-4964
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4964.001.patch
>
>
> Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead 
> (POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the 
> ShuffleHandler.
> It would be beneficial if these were separately configurable. 
> - Running without readahead can lead to significant seek storms caused by 
> large numbers of sendfiles() competing with one another.
> - However, running with drop-behind can also lead to seek storms because 
> there are cases where the server can successfully write the shuffle bytes to 
> the network, BUT the client doesn't want the bytes right now (MergeManager 
> wants to WAIT is an example) so it ignores them and asks for them again a bit 
> later. This causes repeated reads of the same data from disk.
> I'll attach a simple patch that enables/disables readahead based on 
> mapreduce.shuffle.readahead.bytes==0, leaving 
> mapreduce.shuffle.manage.os.cache controlling only the drop-behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2016-04-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243363#comment-15243363
 ] 

Naganarasimha G R commented on YARN-3362:
-

Thanks [~eepayne], [~wangda], & [~sunilg] for waiting, taking a look at it 
shortly ...

> Add node label usage in RM CapacityScheduler web UI
> ---
>
> Key: YARN-3362
> URL: https://issues.apache.org/jira/browse/YARN-3362
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, webapp
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue 
> Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 
> 2015.05.10_3362_Queue_Hierarchy.png, 2015.05.12_3362_Queue_Hierarchy.png, 
> CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, 
> Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362-branch-2.7.002.patch, 
> YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, 
> YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, 
> YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, 
> YARN-3362.20150512-1.patch, capacity-scheduler.xml
>
>
> We don't have node label usage in RM CapacityScheduler web UI now, without 
> this, user will be hard to understand what happened to nodes have labels 
> assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4964) Allow ShuffleHandler readahead without drop-behind

2016-04-15 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4964:
-
Attachment: YARN-4964.001.patch

> Allow ShuffleHandler readahead without drop-behind
> --
>
> Key: YARN-4964
> URL: https://issues.apache.org/jira/browse/YARN-4964
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4964.001.patch
>
>
> Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead 
> (POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the 
> ShuffleHandler.
> It would be beneficial if these were separately configurable. 
> - Running without readahead can lead to significant seek storms caused by 
> large numbers of sendfiles() competing with one another.
> - However, running with drop-behind can also lead to seek storms because 
> there are cases where the server can successfully write the shuffle bytes to 
> the network, BUT the client doesn't want the bytes right now (MergeManager 
> wants to WAIT is an example) so it ignores them and asks for them again a bit 
> later. This causes repeated reads of the same data from disk.
> I'll attach a simple patch that enables/disables readahead based on 
> mapreduce.shuffle.readahead.bytes==0, leaving 
> mapreduce.shuffle.manage.os.cache controlling only the drop-behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4964) Allow ShuffleHandler readahead without drop-behind

2016-04-15 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-4964:


 Summary: Allow ShuffleHandler readahead without drop-behind
 Key: YARN-4964
 URL: https://issues.apache.org/jira/browse/YARN-4964
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.2, 3.0.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts


Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead 
(POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the 
ShuffleHandler.

It would be beneficial if these were separately configurable. 
- Running without readahead can lead to significant seek storms caused by large 
numbers of sendfiles() competing with one another.
- However, running with drop-behind can also lead to seek storms because there 
are cases where the server can successfully write the shuffle bytes to the 
network, BUT the client doesn't want the bytes right now (MergeManager wants to 
WAIT is an example) so it ignores them and asks for them again a bit later. 
This causes repeated reads of the same data from disk.

I'll attach a simple patch that enables/disables readahead based on 
mapreduce.shuffle.readahead.bytes==0, leaving mapreduce.shuffle.manage.os.cache 
controlling only the drop-behind.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243343#comment-15243343
 ] 

Hadoop QA commented on YARN-4963:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
1s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 111 unchanged - 0 fixed = 112 total (was 111) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 39s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 24s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 149m 4s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.webapp.TestRMWithCSRFFilter |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | 

[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243280#comment-15243280
 ] 

Vinod Kumar Vavilapalli commented on YARN-4955:
---

This looks good to me too. [~xgong], can you please add a simple test which 
validates that the client retries on socket-timeout now?

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>

[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243237#comment-15243237
 ] 

Li Lu commented on YARN-4955:
-

New fix looks fine. Thanks! 

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:567)
>   ... 24 more
> Caused by: 

[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243219#comment-15243219
 ] 

Wangda Tan commented on YARN-4963:
--

[~nroberts], +1 to this feature, this gonna be very useful.

Instead of only limit #off-switch containers allocate per heartbeat, can we 
limit #containers allocation regardless of locality? I can see some values to 
limit #containers for rack/node locality as well. For example, if user wants 
their containers allocated spread across all the cluster. 

> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-15 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243194#comment-15243194
 ] 

Kuhu Shukla commented on YARN-4940:
---

+1 lgtm (non-binding). 

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch, 
> YARN-4940.03.patch, YARN-4940.04.patch, YARN-4940.05.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> 

[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2016-04-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243214#comment-15243214
 ] 

Sunil G commented on YARN-3362:
---

Thanks [~eepayne] for sharing patch here. Very much appreciate the same.
Overall patch looks fine for me. Will wait for [~Naganarasimha Garla] also.

> Add node label usage in RM CapacityScheduler web UI
> ---
>
> Key: YARN-3362
> URL: https://issues.apache.org/jira/browse/YARN-3362
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, webapp
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue 
> Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 
> 2015.05.10_3362_Queue_Hierarchy.png, 2015.05.12_3362_Queue_Hierarchy.png, 
> CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, 
> Screen Shot 2015-04-29 at 11.42.17 AM.png, YARN-3362-branch-2.7.002.patch, 
> YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, 
> YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, 
> YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, 
> YARN-3362.20150512-1.patch, capacity-scheduler.xml
>
>
> We don't have node label usage in RM CapacityScheduler web UI now, without 
> this, user will be hard to understand what happened to nodes have labels 
> assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4955:
-
Priority: Critical  (was: Major)

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:567)
>   ... 24 more
> Caused by: 

[jira] [Commented] (YARN-4962) support filling up containers on node one by one

2016-04-15 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243136#comment-15243136
 ] 

Daniel Templeton commented on YARN-4962:


This is a common issue in mixed node clusters.  The typical solution is to load 
the cluster "from different ends."  If the cluster has nodes of type A and type 
B (say, regular and big memory) and jobs of type a and b (where type b jobs 
need type B machines), the scheduler should schedule type a jobs to type A 
nodes, spilling into type B nodes only when there are no more type A nodes 
available.  Type b jobs only run on type B nodes.

In Grid Engine, the way that's implemented is with node labels.  All type B 
nodes are labeled as such.  All type b jobs are submitted with a hard request 
for type B nodes (hard = requires).  All other jobs are submitted with a soft 
request for !type B nodes (soft = preference).

>  support filling up containers on node one by one
> -
>
> Key: YARN-4962
> URL: https://issues.apache.org/jira/browse/YARN-4962
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
>
> we had a gpu cluster, jobs with bigger resource request couldn't be satisfied 
> for node is running the jobs with smaller resource request.  we didn't open 
> reserve system because gpu jobs may run days or weeks. we expect scheduler 
> allocate containers to fill the node , then there will be resource to run   
> jobs with big resource request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243122#comment-15243122
 ] 

Hadoop QA commented on YARN-4514:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 50s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 2m 54s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:e35bf0f |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12798975/YARN-4514-YARN-3368.8.patch
 |
| JIRA Issue | YARN-4514 |
| Optional Tests |  asflicense  |
| uname | Linux ab19464933d2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-3368 / e35bf0f |
| modules | C:  hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui   .  U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11098/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, 
> YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, 
> YARN-4514-YARN-3368.4.patch, YARN-4514-YARN-3368.5.patch, 
> YARN-4514-YARN-3368.6.patch, YARN-4514-YARN-3368.7.patch, 
> YARN-4514-YARN-3368.8.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-15 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4514:
--
Attachment: YARN-4514-YARN-3368.8.patch

Thank you [~leftnoteasy]. Please help to check the update message in 
default-config.js. Attaching an updated patch

> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, 
> YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, 
> YARN-4514-YARN-3368.4.patch, YARN-4514-YARN-3368.5.patch, 
> YARN-4514-YARN-3368.6.patch, YARN-4514-YARN-3368.7.patch, 
> YARN-4514-YARN-3368.8.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-15 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4963:
-
Attachment: YARN-4963.001.patch

> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-15 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-4963:


 Summary: capacity scheduler: Make number of OFF_SWITCH assignments 
per heartbeat configurable
 Key: YARN-4963
 URL: https://issues.apache.org/jira/browse/YARN-4963
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.7.2, 3.0.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts


Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment per 
heartbeat. With more and more non MapReduce workloads coming along, the degree 
of locality is declining, causing scheduling to be significantly slower. It's 
still important to limit the number of OFF_SWITCH assignments to avoid densely 
packing OFF_SWITCH containers onto nodes. 

Proposal is to add a simple config that makes the number of OFF_SWITCH 
assignments configurable.

Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-04-15 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243083#comment-15243083
 ] 

Konstantinos Karanasos commented on YARN-2883:
--

Yep, I will address the visibility comment -- I just wanted to first do the 
refactoring fast, so that I get your feedback.

Regarding the findbug, I tried to fix the problem by adding a 
@SuppressWarnings("unchecked"), but it does not seem to work.
Any ideas about what is wrong?

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-trunk.004.patch, YARN-2883-trunk.005.patch, 
> YARN-2883-trunk.006.patch, YARN-2883-trunk.007.patch, 
> YARN-2883-trunk.008.patch, YARN-2883-trunk.009.patch, 
> YARN-2883-trunk.010.patch, YARN-2883-trunk.011.patch, 
> YARN-2883-trunk.012.patch, YARN-2883-trunk.013.patch, 
> YARN-2883-yarn-2877.001.patch, YARN-2883-yarn-2877.002.patch, 
> YARN-2883-yarn-2877.003.patch, YARN-2883-yarn-2877.004.patch, 
> YARN-2883.013.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-04-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243066#comment-15243066
 ] 

Karthik Kambatla commented on YARN-2883:


The latest patch (YARN-2883.013.patch) looks mostly good to me, barring some 
unaddressed comments from my previous review - visibility of methods in 
ContainersMonitorImpl and the config itself. As I said, for the config, I am 
comfortable filing a follow-up. And, yeah, the findbugs. 

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-trunk.004.patch, YARN-2883-trunk.005.patch, 
> YARN-2883-trunk.006.patch, YARN-2883-trunk.007.patch, 
> YARN-2883-trunk.008.patch, YARN-2883-trunk.009.patch, 
> YARN-2883-trunk.010.patch, YARN-2883-trunk.011.patch, 
> YARN-2883-trunk.012.patch, YARN-2883-trunk.013.patch, 
> YARN-2883-yarn-2877.001.patch, YARN-2883-yarn-2877.002.patch, 
> YARN-2883-yarn-2877.003.patch, YARN-2883-yarn-2877.004.patch, 
> YARN-2883.013.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4955) Add retry for SocketTimeoutException in TimelineClient

2016-04-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243048#comment-15243048
 ] 

Junping Du commented on YARN-4955:
--

03 patch LGTM.
If no further comments from others, I will commit it shortly after HADOOP-13026 
get commit.

> Add retry for SocketTimeoutException in TimelineClient
> --
>
> Key: YARN-4955
> URL: https://issues.apache.org/jira/browse/YARN-4955
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4955.1.patch, YARN-4955.2.patch, YARN-4955.3.patch
>
>
> We saw this exception several times when we tried to getDelegationToken from 
> ATS.
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:569)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:234)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:582)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:479)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:475)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:467)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> 

[jira] [Commented] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI

2016-04-15 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242992#comment-15242992
 ] 

Eric Payne commented on YARN-4751:
--

bq. YARN-4751 does not apply to branch-2.7. Rebase required? Wrong Branch? See 
https://wiki.apache.org/hadoop/HowToContribute for help.
{{YARN-4751-branch-2.7.004.patch}} won't apply until the 2.7 patch for 
YARN-3362 is committed. 

> In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
> ---
>
> Key: YARN-4751
> URL: https://issues.apache.org/jira/browse/YARN-4751
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: 2.7 CS UI No BarGraph.jpg, 
> YARH-4752-branch-2.7.001.patch, YARH-4752-branch-2.7.002.patch, 
> YARN-4751-branch-2.7.003.patch, YARN-4751-branch-2.7.004.patch
>
>
> In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs 
> separated by partition. When applications are running on a labeled queue, no 
> color is shown in the bar graph, and several of the "Used" metrics are zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242910#comment-15242910
 ] 

Hadoop QA commented on YARN-4940:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 36s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 132m 52s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.webapp.TestRMWithCSRFFilter |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation |
| JDK v1.8.0_77 Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
| 

[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2016-04-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242874#comment-15242874
 ] 

Naganarasimha G R commented on YARN-3215:
-

Thanks [~wangda], Missed to see that you had uploaded a patch for 2.8. Will 
check why some test cases are failing.

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3215.v1.001.patch, YARN-3215.v2.001.patch, 
> YARN-3215.v2.002.patch, YARN-3215.v2.003.patch, YARN-3215.v2.branch-2.8.patch
>
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242785#comment-15242785
 ] 

Hadoop QA commented on YARN-4940:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 35s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 41s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 133m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.webapp.TestRMWithCSRFFilter |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation |
| JDK v1.8.0_77 Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
| JDK 

[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-15 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242768#comment-15242768
 ] 

sandflee commented on YARN-4940:


Thanks [~templedf] [~kshukla], update the patch,  and I don't think the test 
failure related to this issue.

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch, 
> YARN-4940.03.patch, YARN-4940.04.patch, YARN-4940.05.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  

[jira] [Updated] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-15 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4940:
---
Attachment: YARN-4940.05.patch

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch, 
> YARN-4940.03.patch, YARN-4940.04.patch, YARN-4940.05.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-15 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242756#comment-15242756
 ] 

Kuhu Shukla commented on YARN-4940:
---

Latest patch looking good with the minute nit of extra line before 
{{createExcludeFile}}. Most test failures seem unrelated.

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch, 
> YARN-4940.03.patch, YARN-4940.04.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   

[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242712#comment-15242712
 ] 

Naganarasimha G R commented on YARN-4909:
-

ok, Committing the patch shortly 

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch, 0005-YARN-4909.patch, 
> 0006-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242707#comment-15242707
 ] 

Sunil G commented on YARN-4909:
---

Looks fine for me too..

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch, 0005-YARN-4909.patch, 
> 0006-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-15 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4940:
---
Attachment: YARN-4940.04.patch

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch, 
> YARN-4940.03.patch, YARN-4940.04.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242591#comment-15242591
 ] 

Hadoop QA commented on YARN-4948:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 13 new + 
212 unchanged - 0 fixed = 225 total (was 212) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 14 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 26s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 25s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 29s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 24s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 38s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License 

[jira] [Commented] (YARN-4962) support filling up containers on node one by one

2016-04-15 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242589#comment-15242589
 ] 

sandflee commented on YARN-4962:


one simple way is enable continuous Scheduling,  allocate containers to the 
node with least resource not max.

>  support filling up containers on node one by one
> -
>
> Key: YARN-4962
> URL: https://issues.apache.org/jira/browse/YARN-4962
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
>
> we had a gpu cluster, jobs with bigger resource request couldn't be satisfied 
> for node is running the jobs with smaller resource request.  we didn't open 
> reserve system because gpu jobs may run days or weeks. we expect scheduler 
> allocate containers to fill the node , then there will be resource to run   
> jobs with big resource request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4962) support filling up containers on node one by one

2016-04-15 Thread sandflee (JIRA)
sandflee created YARN-4962:
--

 Summary:  support filling up containers on node one by one
 Key: YARN-4962
 URL: https://issues.apache.org/jira/browse/YARN-4962
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: sandflee


we had a gpu cluster, jobs with bigger resource request couldn't be satisfied 
for node is running the jobs with smaller resource request.  we didn't open 
reserve system because gpu jobs may run days or weeks. we expect scheduler 
allocate containers to fill the node , then there will be resource to run   
jobs with big resource request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242555#comment-15242555
 ] 

Naganarasimha G R commented on YARN-4948:
-

I have triggered the build in the backend. If you want to do it on your own, 
one way is to reattach the patch.

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: jialei weng
>Assignee: jialei weng
> Attachments: YARN-4948.001.patch, YARN-4948.002.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242556#comment-15242556
 ] 

Naganarasimha G R commented on YARN-4948:
-

Thanks [~wangda]

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: jialei weng
>Assignee: jialei weng
> Attachments: YARN-4948.001.patch, YARN-4948.002.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)