[jira] [Updated] (YARN-4113) RM should respect retry-interval when uses RetryPolicies.RETRY_FOREVER

2015-09-18 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4113:
--
Attachment: 0001-YARN-4113.patch

As HADOOP-12386 is committed, changing {{RetryProxy}} and {{ServerProy}} to use 
{{retryForeverWithFixedSleep}} policy instead of RETRY_FOREVER.

> RM should respect retry-interval when uses RetryPolicies.RETRY_FOREVER
> --
>
> Key: YARN-4113
> URL: https://issues.apache.org/jira/browse/YARN-4113
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4113.patch
>
>
> Found one issue in RMProxy how to initialize RetryPolicy: In 
> RMProxy#createRetryPolicy. When rmConnectWaitMS is set to -1 (wait forever), 
> it uses RetryPolicies.RETRY_FOREVER which doesn't respect 
> {{yarn.resourcemanager.connect.retry-interval.ms}} setting.
> RetryPolicies.RETRY_FOREVER uses 0 as the interval, when I run the test 
> without properly setup localhost name: 
> {{TestYarnClient#testShouldNotRetryForeverForNonNetworkExceptions}}, it wrote 
> 14G DEBUG exception message to system before it dies. This will be very bad 
> if we do the same thing in a production cluster.
> We should fix two places:
> - Make RETRY_FOREVER can take retry-interval as constructor parameter.
> - Respect retry-interval when we uses RETRY_FOREVER policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805062#comment-14805062
 ] 

Hudson commented on YARN-4135:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2352 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2352/])
YARN-4135. Improve the assertion message in MockRM while failing after waiting 
for the state.(Nijel S F via rohithsharmaks) (rohithsharmaks: rev 
723c31d45bc0f64b1d1a67350b108059d2a220a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt


> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-09-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876101#comment-14876101
 ] 

Hadoop QA commented on YARN-4183:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 58s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 23s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 55s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 37s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  63m 12s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 108m 36s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761137/YARN-4183.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3f42753 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9211/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9211/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9211/console |


This message was automatically generated.

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-09-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876222#comment-14876222
 ] 

Hadoop QA commented on YARN-3985:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 11 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 55s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 54s | The applied patch generated  1 
new checkstyle issues (total was 23, now 22). |
| {color:green}+1{color} | whitespace |   0m  7s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m 45s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761157/YARN-3985.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 88d89267 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9213/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9213/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9213/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9213/console |


This message was automatically generated.

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Component/s: security
 resourcemanager

> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when HA is enabled.
> --
>
> Key: YARN-4187
> URL: https://issues.apache.org/jira/browse/YARN-4187
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when HA is enabled. This will cause HDFS token renew failure 
> for renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} 
> exclude the client address in HDFS DelegationTokenIdentifier.
> The following is the exception which cause the job fail
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:438)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
> at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
> 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Attachment: YARN-4187.000.patch

> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when HA is enabled.
> --
>
> Key: YARN-4187
> URL: https://issues.apache.org/jira/browse/YARN-4187
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when HA is enabled. This will cause HDFS token renew failure 
> for renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} 
> exclude the client address in HDFS DelegationTokenIdentifier.
> The following is the exception which cause the job fail
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:438)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
> at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
> at 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Attachment: YARN-4187.000.patch

> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when HA is enabled.
> --
>
> Key: YARN-4187
> URL: https://issues.apache.org/jira/browse/YARN-4187
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when HA is enabled. This will cause HDFS token renew failure 
> for renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} 
> exclude the client address in HDFS DelegationTokenIdentifier.
> The following is the exception which cause the job fail
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:438)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
> at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
> at 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Description: 
Yarn Client uses local address instead of RM address as token renewer in a 
secure cluster when RM HA is enabled. This will cause HDFS token renew failure 
for renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} 
exclude the client address in HDFS {{DelegationTokenIdentifier}}.
The reason why the local address is returned is: When HA is enabled, 
"yarn.resourcemanager.address" may not be set,  if 
{{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", the 
default address "0.0.0.0:8032" will be used,  Based on the following code at 
SecurityUtil.java, the local address will be used to replace "0.0.0.0".
{code}
  private static String replacePattern(String[] components, String hostname)
  throws IOException {
String fqdn = hostname;
if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
  fqdn = getLocalHostName();
}
return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
components[2];
  }
  static String getLocalHostName() throws UnknownHostException {
return InetAddress.getLocalHost().getCanonicalHostName();
  }
  public static String getServerPrincipal(String principalConfig,
  InetAddress addr) throws IOException {
String[] components = getComponents(principalConfig);
if (components == null || components.length != 3
|| !components[1].equals(HOSTNAME_PATTERN)) {
  return principalConfig;
} else {
  if (addr == null) {
throw new IOException("Can't replace " + HOSTNAME_PATTERN
+ " pattern since client address is null");
  }
  return replacePattern(components, addr.getCanonicalHostName());
}
  }
{code}
The following is the exception which cause the job fail:
{code}
15/09/12 16:27:24 WARN security.UserGroupInformation: 
PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
cause:java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at 

[jira] [Commented] (YARN-4187) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876724#comment-14876724
 ] 

zhihai xu commented on YARN-4187:
-

I attached a patch YARN-4187.000.patch which use YarnConfiguration instead of 
JobConf(Configuration) to call {{getSocketAddr}}, because 
Configuration.getSocketAddr won't support RM HA. Also we need check whether 
RM_HA_ID is configured, if RM_HA_ID is not configured, the first one in 
RM_HA_IDS will be used, otherwise it will fall back to use the client local 
address.

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: YARN-4187
> URL: https://issues.apache.org/jira/browse/YARN-4187
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Attachment: (was: YARN-4187.000.patch)

> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when HA is enabled.
> --
>
> Key: YARN-4187
> URL: https://issues.apache.org/jira/browse/YARN-4187
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when HA is enabled. This will cause HDFS token renew failure 
> for renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} 
> exclude the client address in HDFS DelegationTokenIdentifier.
> The following is the exception which cause the job fail
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:438)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
> at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Description: 
Yarn Client uses local address instead RM address as token renewer in a secure 
cluster when RM HA is enabled. This will cause HDFS token renew failure for 
renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} exclude 
the client address in HDFS {{DelegationTokenIdentifier}}.
The reason why the local address is returned is: When HA is enabled, 
"yarn.resourcemanager.address" may not be set, so the default address 
"0.0.0.0:8032" will be used,  Based on the following code at SecurityUtil.java, 
the local address will be used to replace "0.0.0.0".
{code}
  private static String replacePattern(String[] components, String hostname)
  throws IOException {
String fqdn = hostname;
if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
  fqdn = getLocalHostName();
}
return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
components[2];
  }
  static String getLocalHostName() throws UnknownHostException {
return InetAddress.getLocalHost().getCanonicalHostName();
  }
  public static String getServerPrincipal(String principalConfig,
  InetAddress addr) throws IOException {
String[] components = getComponents(principalConfig);
if (components == null || components.length != 3
|| !components[1].equals(HOSTNAME_PATTERN)) {
  return principalConfig;
} else {
  if (addr == null) {
throw new IOException("Can't replace " + HOSTNAME_PATTERN
+ " pattern since client address is null");
  }
  return replacePattern(components, addr.getCanonicalHostName());
}
  }
{code}
The following is the exception which cause the job fail:
{code}
15/09/12 16:27:24 WARN security.UserGroupInformation: 
PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
cause:java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
at 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Description: 
Yarn Client uses local address instead RM address as token renewer in a secure 
cluster when RM HA is enabled. This will cause HDFS token renew failure for 
renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} exclude 
the client address in HDFS {{DelegationTokenIdentifier}}.
The reason why the local address is returned is: When HA is enabled, 
"yarn.resourcemanager.address" may not be set and  
{{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal",so 
the default address "0.0.0.0:8032" will be used,  Based on the following code 
at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
{code}
  private static String replacePattern(String[] components, String hostname)
  throws IOException {
String fqdn = hostname;
if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
  fqdn = getLocalHostName();
}
return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
components[2];
  }
  static String getLocalHostName() throws UnknownHostException {
return InetAddress.getLocalHost().getCanonicalHostName();
  }
  public static String getServerPrincipal(String principalConfig,
  InetAddress addr) throws IOException {
String[] components = getComponents(principalConfig);
if (components == null || components.length != 3
|| !components[1].equals(HOSTNAME_PATTERN)) {
  return principalConfig;
} else {
  if (addr == null) {
throw new IOException("Can't replace " + HOSTNAME_PATTERN
+ " pattern since client address is null");
  }
  return replacePattern(components, addr.getCanonicalHostName());
}
  }
{code}
The following is the exception which cause the job fail:
{code}
15/09/12 16:27:24 WARN security.UserGroupInformation: 
PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
cause:java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Description: 
Yarn Client uses local address instead of RM address as token renewer in a 
secure cluster when RM HA is enabled. This will cause HDFS token renew failure 
for renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} 
exclude the client address in HDFS {{DelegationTokenIdentifier}}.
The reason why the local address is returned is: When HA is enabled, 
"yarn.resourcemanager.address" may not be set and  
{{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal",so 
the default address "0.0.0.0:8032" will be used,  Based on the following code 
at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
{code}
  private static String replacePattern(String[] components, String hostname)
  throws IOException {
String fqdn = hostname;
if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
  fqdn = getLocalHostName();
}
return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
components[2];
  }
  static String getLocalHostName() throws UnknownHostException {
return InetAddress.getLocalHost().getCanonicalHostName();
  }
  public static String getServerPrincipal(String principalConfig,
  InetAddress addr) throws IOException {
String[] components = getComponents(principalConfig);
if (components == null || components.length != 3
|| !components[1].equals(HOSTNAME_PATTERN)) {
  return principalConfig;
} else {
  if (addr == null) {
throw new IOException("Can't replace " + HOSTNAME_PATTERN
+ " pattern since client address is null");
  }
  return replacePattern(components, addr.getCanonicalHostName());
}
  }
{code}
The following is the exception which cause the job fail:
{code}
15/09/12 16:27:24 WARN security.UserGroupInformation: 
PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
cause:java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Summary: Yarn Client uses local address instead of RM address as token 
renewer in a secure cluster when RM HA is enabled.  (was: Yarn Client uses 
local address instead RM address as token renewer in a secure cluster when RM 
HA is enabled.)

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: YARN-4187
> URL: https://issues.apache.org/jira/browse/YARN-4187
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set and  
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal",so 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> 

[jira] [Updated] (YARN-4188) MoveApplicationAcrossQueuesResponse should be an abstract class

2015-09-18 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-4188:
---
Description: (was: In AppSchedulingInfo.java the method 
checkForDeactivation() has these 2 consecutive lines:
{code}
ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
if (request.getNumContainers() > 0) {
{code}
the first line calls getResourceRequest and it can return null.
{code}
synchronized public ResourceRequest getResourceRequest(
Priority priority, String resourceName) {
Map nodeRequests = requests.get(priority);
return  (nodeRequests == null) ? {color:red} null : 
nodeRequests.get(resourceName);
}
{code}
The second line dereferences the pointer directly without a check.
If the pointer is null, the RM dies. 

{quote}2015-03-17 14:14:04,757 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
at java.lang.Thread.run(Thread.java:722)
{color:red} *2015-03-17 14:14:04,758 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
bbye..*{color} {quote})

> MoveApplicationAcrossQueuesResponse should be an abstract class
> ---
>
> Key: YARN-4188
> URL: https://issues.apache.org/jira/browse/YARN-4188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4190) missing container information in FairScheduler preemption log.

2015-09-18 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876762#comment-14876762
 ] 

Xianyin Xin commented on YARN-4190:
---

Hi [~zxu], will we consider to fix the issue in YARN-4134 or YARN-3405, or just 
wait for the new preemption logic in YARN-2154? It blocks YARN-4090 and 
YARN-4120 which want to decouple scheduling and preemption and get a better 
performance.

> missing container information in FairScheduler preemption log.
> --
>
> Key: YARN-4190
> URL: https://issues.apache.org/jira/browse/YARN-4190
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>
> Add container information in FairScheduler preemption log to help debug. 
> Currently the following log doesn't have container information
> {code}
> LOG.info("Preempting container (prio=" + 
> container.getContainer().getPriority() +
> "res=" + container.getContainer().getResource() +
> ") from queue " + queue.getName());
> {code}
> So it will be very difficult to debug preemption related issue for 
> FairScheduler.
> Even the container information is printed in the following code
> {code}
> LOG.info("Killing container" + container +
> " (after waiting for premption for " +
> (getClock().getTime() - time) + "ms)");
> {code}
> But we can't match these two logs based on the container ID.
> It will be very useful to add container information in the first log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876759#comment-14876759
 ] 

Hudson commented on YARN-3920:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2359 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2359/])
YARN-3920. FairScheduler container reservation on a node should be configurable 
to limit it to large containers (adhoot via asuresh) (Arun Suresh: rev 
94dec5a9164cd9bc573fbf74e76bcff9e7c5c637)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-18 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876765#comment-14876765
 ] 

Sangjin Lee commented on YARN-4075:
---

The latest patch looks good to me. Once YARN-4074 is committed, we should kick 
off jenkins and go from there...

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.POC.1.patch, 
> YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4190) missing container information in FairScheduler preemption log.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu resolved YARN-4190.
-
Resolution: Later

> missing container information in FairScheduler preemption log.
> --
>
> Key: YARN-4190
> URL: https://issues.apache.org/jira/browse/YARN-4190
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>
> Add container information in FairScheduler preemption log to help debug. 
> Currently the following log doesn't have container information
> {code}
> LOG.info("Preempting container (prio=" + 
> container.getContainer().getPriority() +
> "res=" + container.getContainer().getResource() +
> ") from queue " + queue.getName());
> {code}
> So it will be very difficult to debug preemption related issue for 
> FairScheduler.
> Even the container information is printed in the following code
> {code}
> LOG.info("Killing container" + container +
> " (after waiting for premption for " +
> (getClock().getTime() - time) + "ms)");
> {code}
> But we can't match these two logs based on the container ID.
> It will be very useful to add container information in the first log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4190) missing container information in FairScheduler preemption log.

2015-09-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876772#comment-14876772
 ] 

zhihai xu commented on YARN-4190:
-

thanks [~xinxianyin] for the information! Yes, good suggestion. resolve it as 
fix later, make it depend on YARN-4134.

> missing container information in FairScheduler preemption log.
> --
>
> Key: YARN-4190
> URL: https://issues.apache.org/jira/browse/YARN-4190
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>
> Add container information in FairScheduler preemption log to help debug. 
> Currently the following log doesn't have container information
> {code}
> LOG.info("Preempting container (prio=" + 
> container.getContainer().getPriority() +
> "res=" + container.getContainer().getResource() +
> ") from queue " + queue.getName());
> {code}
> So it will be very difficult to debug preemption related issue for 
> FairScheduler.
> Even the container information is printed in the following code
> {code}
> LOG.info("Killing container" + container +
> " (after waiting for premption for " +
> (getClock().getTime() - time) + "ms)");
> {code}
> But we can't match these two logs based on the container ID.
> It will be very useful to add container information in the first log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-18 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4180:

Attachment: YARN-4180.001.patch

reuse the same retry proxy used by AM client for RM client. Also opened 
YARN-4185 to improve this retry mechanism

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4186) Make WebAppUtils a public API for yarn

2015-09-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876125#comment-14876125
 ] 

Sunil G commented on YARN-4186:
---

We could also get this information also from REST api 
{{/apps//appattempts}}
{code}
public class AppAttemptInfo {
...
  protected String logsLink;

}{code}

> Make WebAppUtils a public API for yarn
> --
>
> Key: YARN-4186
> URL: https://issues.apache.org/jira/browse/YARN-4186
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>
> Application types like Tez will want to expose AM container log location to 
> users without having to make a call to the RM.
> Exposing getRunningLogURL as public will provide this functionality.
> Jon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Description: 
Yarn Client uses local address instead RM address as token renewer in a secure 
cluster when RM HA is enabled. This will cause HDFS token renew failure for 
renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} exclude 
the client address in HDFS {{DelegationTokenIdentifier}}.
The reason why the local address is returned is: When HA is enabled, 
"yarn.resourcemanager.address" may not be set, so the default address 
"0.0.0.0:8032" will be used,  Based on the following code at KerberosUtil.java, 
the local address will be used to replace "0.0.0.0".
{code}
  public static final String getServicePrincipal(String service, String 
hostname)
  throws UnknownHostException {
String fqdn = hostname;
if (null == fqdn || fqdn.equals("") || fqdn.equals("0.0.0.0")) {
  fqdn = getLocalHostName();
}
// convert hostname to lowercase as kerberos does not work with hostnames
// with uppercase characters.
return service + "/" + fqdn.toLowerCase(Locale.US);
  }
  /* Return fqdn of the current host */
  static String getLocalHostName() throws UnknownHostException {
return InetAddress.getLocalHost().getCanonicalHostName();
  }
{code}
The following is the exception which cause the job fail:
{code}
15/09/12 16:27:24 WARN security.UserGroupInformation: 
PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
cause:java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:438)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at 

[jira] [Updated] (YARN-4189) Capacity Scheduler : Improve location preference waiting mechanism

2015-09-18 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4189:
-
Attachment: YARN-4189 design v1.pdf

> Capacity Scheduler : Improve location preference waiting mechanism
> --
>
> Key: YARN-4189
> URL: https://issues.apache.org/jira/browse/YARN-4189
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4189 design v1.pdf
>
>
> There're some issues with current Capacity Scheduler implementation of delay 
> scheduling:
> *1) Waiting time to allocate each container highly depends on cluster 
> availability*
> Currently, app can only increase missed-opportunity when a node has available 
> resource AND it gets traversed by a scheduler. There’re lots of possibilities 
> that an app doesn’t get traversed by a scheduler, for example:
> A cluster has 2 racks (rack1/2), each rack has 40 nodes. 
> Node-locality-delay=40. An application prefers rack1. 
> Node-heartbeat-interval=1s.
> Assume there are 2 nodes available on rack1, delay to allocate one container 
> = 40 sec.
> If there are 20 nodes available on rack1, delay of allocating one container = 
> 2 sec.
> *2) It could violate scheduling policies (Fifo/Priority/Fair)*
> Assume a cluster is highly utilized, an app (app1) has higher priority, it 
> wants locality. And there’s another app (app2) has lower priority, but it 
> doesn’t care about locality. When node heartbeats with available resource, 
> app1 decides to wait, so app2 gets the available slot. This should be 
> considered as a bug that we need to fix.
> The same problem could happen when we use FIFO/Fair queue policies.
> Another problem similar to this is related to preemption: when preemption 
> policy preempts some resources from queue-A for queue-B (queue-A is 
> over-satisfied and queue-B is under-satisfied). But queue-B is waiting for 
> the node-locality-delay so queue-A will get resources back. In next round, 
> preemption policy could preempt this resources again from queue-A.
> This JIRA is target to solve these problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4189) Capacity Scheduler : Improve location preference waiting mechanism

2015-09-18 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4189:


 Summary: Capacity Scheduler : Improve location preference waiting 
mechanism
 Key: YARN-4189
 URL: https://issues.apache.org/jira/browse/YARN-4189
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-4189 design v1.pdf

There're some issues with current Capacity Scheduler implementation of delay 
scheduling:

*1) Waiting time to allocate each container highly depends on cluster 
availability*
Currently, app can only increase missed-opportunity when a node has available 
resource AND it gets traversed by a scheduler. There’re lots of possibilities 
that an app doesn’t get traversed by a scheduler, for example:

A cluster has 2 racks (rack1/2), each rack has 40 nodes. 
Node-locality-delay=40. An application prefers rack1. 
Node-heartbeat-interval=1s.
Assume there are 2 nodes available on rack1, delay to allocate one container = 
40 sec.
If there are 20 nodes available on rack1, delay of allocating one container = 2 
sec.

*2) It could violate scheduling policies (Fifo/Priority/Fair)*

Assume a cluster is highly utilized, an app (app1) has higher priority, it 
wants locality. And there’s another app (app2) has lower priority, but it 
doesn’t care about locality. When node heartbeats with available resource, app1 
decides to wait, so app2 gets the available slot. This should be considered as 
a bug that we need to fix.

The same problem could happen when we use FIFO/Fair queue policies.

Another problem similar to this is related to preemption: when preemption 
policy preempts some resources from queue-A for queue-B (queue-A is 
over-satisfied and queue-B is under-satisfied). But queue-B is waiting for the 
node-locality-delay so queue-A will get resources back. In next round, 
preemption policy could preempt this resources again from queue-A.

This JIRA is target to solve these problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4190) Add container information in FairScheduler preemption log to help debug.

2015-09-18 Thread zhihai xu (JIRA)
zhihai xu created YARN-4190:
---

 Summary: Add container information in FairScheduler preemption log 
to help debug.
 Key: YARN-4190
 URL: https://issues.apache.org/jira/browse/YARN-4190
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial


Add container information in FairScheduler preemption log to help debug. 
Currently the following log doesn't have container information
{code}
LOG.info("Preempting container (prio=" + container.getContainer().getPriority() 
+
"res=" + container.getContainer().getResource() +
") from queue " + queue.getName());
{code}
So it will be very difficult to debug preemption related issue for 
FairScheduler.
Even the container information is printed in the following code
{code}
LOG.info("Killing container" + container +
" (after waiting for premption for " +
(getClock().getTime() - time) + "ms)");
{code}
But we can't match these two logs based on the container ID.
It will be very useful to add container information in the first log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4190) missing container information in FairScheduler preemption log.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4190:

Summary: missing container information in FairScheduler preemption log.  
(was: Add container information in FairScheduler preemption log to help debug.)

> missing container information in FairScheduler preemption log.
> --
>
> Key: YARN-4190
> URL: https://issues.apache.org/jira/browse/YARN-4190
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>
> Add container information in FairScheduler preemption log to help debug. 
> Currently the following log doesn't have container information
> {code}
> LOG.info("Preempting container (prio=" + 
> container.getContainer().getPriority() +
> "res=" + container.getContainer().getResource() +
> ") from queue " + queue.getName());
> {code}
> So it will be very difficult to debug preemption related issue for 
> FairScheduler.
> Even the container information is printed in the following code
> {code}
> LOG.info("Killing container" + container +
> " (after waiting for premption for " +
> (getClock().getTime() - time) + "ms)");
> {code}
> But we can't match these two logs based on the container ID.
> It will be very useful to add container information in the first log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)
zhihai xu created YARN-4187:
---

 Summary: Yarn Client uses local address instead RM address as 
token renewer in a secure cluster when HA is enabled.
 Key: YARN-4187
 URL: https://issues.apache.org/jira/browse/YARN-4187
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu


Yarn Client uses local address instead RM address as token renewer in a secure 
cluster when HA is enabled. This will cause HDFS token renew failure for 
renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} exclude 
the client address in HDFS DelegationTokenIdentifier.
The following is the exception which cause the job fail
{code}
15/09/12 16:27:24 WARN security.UserGroupInformation: 
PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
cause:java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:438)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Description: 
Yarn Client uses local address instead RM address as token renewer in a secure 
cluster when HA is enabled. This will cause HDFS token renew failure for 
renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} exclude 
the client address in HDFS DelegationTokenIdentifier. The reason why the local 
address is return is when HA is enabled 
The following is the exception which cause the job fail
{code}
15/09/12 16:27:24 WARN security.UserGroupInformation: 
PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
cause:java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:438)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Description: 
Yarn Client uses local address instead RM address as token renewer in a secure 
cluster when RM HA is enabled. This will cause HDFS token renew failure for 
renewer "nobody"  if the rules from {{hadoop.security.auth_to_local}} exclude 
the client address in HDFS {{DelegationTokenIdentifier}}.
The reason why the local address is returned is: When HA is enabled, 
"yarn.resourcemanager.address" may not be set, so the default address 
"0.0.0.0:8032" will be used,  Based on the following code at KerberosUtil, the 
local address will be used to replace "0.0.0.0".
{code}
  public static final String getServicePrincipal(String service, String 
hostname)
  throws UnknownHostException {
String fqdn = hostname;
if (null == fqdn || fqdn.equals("") || fqdn.equals("0.0.0.0")) {
  fqdn = getLocalHostName();
}
// convert hostname to lowercase as kerberos does not work with hostnames
// with uppercase characters.
return service + "/" + fqdn.toLowerCase(Locale.US);
  }
  /* Return fqdn of the current host */
  static String getLocalHostName() throws UnknownHostException {
return InetAddress.getLocalHost().getCanonicalHostName();
  }
{code}
The following is the exception which cause the job fail:
{code}
15/09/12 16:27:24 WARN security.UserGroupInformation: 
PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
cause:java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
java.io.IOException: Failed to run job : yarn tries to renew a token with 
renewer nobody
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:300)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:438)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Summary: Yarn Client uses local address instead RM address as token renewer 
in a secure cluster when RM HA is enabled.  (was: Yarn Client uses local 
address instead RM address as token renewer in a secure cluster when HA is 
enabled.)

> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> -
>
> Key: YARN-4187
> URL: https://issues.apache.org/jira/browse/YARN-4187
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set, so the default address 
> "0.0.0.0:8032" will be used,  Based on the following code at KerberosUtil, 
> the local address will be used to replace "0.0.0.0".
> {code}
>   public static final String getServicePrincipal(String service, String 
> hostname)
>   throws UnknownHostException {
> String fqdn = hostname;
> if (null == fqdn || fqdn.equals("") || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> // convert hostname to lowercase as kerberos does not work with hostnames
> // with uppercase characters.
> return service + "/" + fqdn.toLowerCase(Locale.US);
>   }
>   /* Return fqdn of the current host */
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at 

[jira] [Updated] (YARN-4188) MoveApplicationAcrossQueuesResponse should be an abstract class

2015-09-18 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-4188:
---
Priority: Minor  (was: Blocker)

> MoveApplicationAcrossQueuesResponse should be an abstract class
> ---
>
> Key: YARN-4188
> URL: https://issues.apache.org/jira/browse/YARN-4188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Minor
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4188) MoveApplicationAcrossQueuesResponse should be an abstract class

2015-09-18 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-4188:
--

 Summary: MoveApplicationAcrossQueuesResponse should be an abstract 
class
 Key: YARN-4188
 URL: https://issues.apache.org/jira/browse/YARN-4188
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Giovanni Matteo Fumarola
Assignee: Brahma Reddy Battula
Priority: Blocker
 Fix For: 2.7.0, 2.6.1


In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
consecutive lines:
{code}
ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
if (request.getNumContainers() > 0) {
{code}
the first line calls getResourceRequest and it can return null.
{code}
synchronized public ResourceRequest getResourceRequest(
Priority priority, String resourceName) {
Map nodeRequests = requests.get(priority);
return  (nodeRequests == null) ? {color:red} null : 
nodeRequests.get(resourceName);
}
{code}
The second line dereferences the pointer directly without a check.
If the pointer is null, the RM dies. 

{quote}2015-03-17 14:14:04,757 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
at java.lang.Thread.run(Thread.java:722)
{color:red} *2015-03-17 14:14:04,758 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4188) MoveApplicationAcrossQueuesResponse should be an abstract class

2015-09-18 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-4188:
---
Target Version/s:   (was: 2.7.0)

> MoveApplicationAcrossQueuesResponse should be an abstract class
> ---
>
> Key: YARN-4188
> URL: https://issues.apache.org/jira/browse/YARN-4188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Minor
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4188) MoveApplicationAcrossQueuesResponse should be an abstract class

2015-09-18 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-4188:
---
Labels:   (was: 2.6.1-candidate)

> MoveApplicationAcrossQueuesResponse should be an abstract class
> ---
>
> Key: YARN-4188
> URL: https://issues.apache.org/jira/browse/YARN-4188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Minor
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4188) MoveApplicationAcrossQueuesResponse should be an abstract class

2015-09-18 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-4188:
---
Fix Version/s: (was: 2.6.1)
   (was: 2.7.0)

> MoveApplicationAcrossQueuesResponse should be an abstract class
> ---
>
> Key: YARN-4188
> URL: https://issues.apache.org/jira/browse/YARN-4188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4188) MoveApplicationAcrossQueuesResponse should be an abstract class

2015-09-18 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola reassigned YARN-4188:
--

Assignee: Giovanni Matteo Fumarola  (was: Brahma Reddy Battula)

> MoveApplicationAcrossQueuesResponse should be an abstract class
> ---
>
> Key: YARN-4188
> URL: https://issues.apache.org/jira/browse/YARN-4188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4188) MoveApplicationAcrossQueuesResponse should be an abstract class

2015-09-18 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-4188:
---
Description: MoveApplicationAcrossQueuesResponse should be an abstract 
class. Additionally the new instance should have a static modifier. Currently 
we are not facing any issues because the response is empty object on success. 

> MoveApplicationAcrossQueuesResponse should be an abstract class
> ---
>
> Key: YARN-4188
> URL: https://issues.apache.org/jira/browse/YARN-4188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
>
> MoveApplicationAcrossQueuesResponse should be an abstract class. Additionally 
> the new instance should have a static modifier. Currently we are not facing 
> any issues because the response is empty object on success. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4187) Yarn Client uses local client address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Summary: Yarn Client uses local client address instead of RM address as 
token renewer in a secure cluster when RM HA is enabled.  (was: Yarn Client 
uses local address instead of RM address as token renewer in a secure cluster 
when RM HA is enabled.)

> Yarn Client uses local client address instead of RM address as token renewer 
> in a secure cluster when RM HA is enabled.
> ---
>
> Key: YARN-4187
> URL: https://issues.apache.org/jira/browse/YARN-4187
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set and  
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal",so 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> 

[jira] [Updated] (YARN-4187) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4187:

Summary: Yarn Client uses local address instead of RM address as token 
renewer in a secure cluster when RM HA is enabled.  (was: Yarn Client uses 
local client address instead of RM address as token renewer in a secure cluster 
when RM HA is enabled.)

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: YARN-4187
> URL: https://issues.apache.org/jira/browse/YARN-4187
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set and  
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal",so 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> 

[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805104#comment-14805104
 ] 

Hudson commented on YARN-4135:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #388 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/388/])
YARN-4135. Improve the assertion message in MockRM while failing after waiting 
for the state.(Nijel S F via rohithsharmaks) (rohithsharmaks: rev 
723c31d45bc0f64b1d1a67350b108059d2a220a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt


> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4182) Killed containers disappear on app attempt page

2015-09-18 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang resolved YARN-4182.
--
Resolution: Duplicate

> Killed containers disappear on app attempt page
> ---
>
> Key: YARN-4182
> URL: https://issues.apache.org/jira/browse/YARN-4182
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Jeff Zhang
> Attachments: 2015-09-18_1601.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4182) Killed containers disappear on app attempt page

2015-09-18 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805223#comment-14805223
 ] 

Jeff Zhang commented on YARN-4182:
--

Thanks [~bibinchundatt] Close it as duplicated and put comment in YARN-3603

> Killed containers disappear on app attempt page
> ---
>
> Key: YARN-4182
> URL: https://issues.apache.org/jira/browse/YARN-4182
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Jeff Zhang
> Attachments: 2015-09-18_1601.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2597) MiniYARNCluster should propagate reason for AHS not starting

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805326#comment-14805326
 ] 

Hudson commented on YARN-2597:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #413 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/413/])
YARN-2597 MiniYARNCluster should propagate reason for AHS not starting (stevel: 
rev a7201d635fc45b169ca3326bad48a3f781efe604)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* hadoop-yarn-project/CHANGES.txt


> MiniYARNCluster should propagate reason for AHS not starting
> 
>
> Key: YARN-2597
> URL: https://issues.apache.org/jira/browse/YARN-2597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2597-001.patch
>
>
> If the AHS doesn't come up, your test run gets an exception telling you this 
> fact -but the underlying cause is not propagated.
> As YARN services do record their failure cause, extracting and propagating 
> this is trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4182) Killed containers disappear on app attempt page

2015-09-18 Thread Jeff Zhang (JIRA)
Jeff Zhang created YARN-4182:


 Summary: Killed containers disappear on app attempt page
 Key: YARN-4182
 URL: https://issues.apache.org/jira/browse/YARN-4182
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Jeff Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805279#comment-14805279
 ] 

Varun Saxena commented on YARN-2902:


[~jlowe], 
bq. As far as properly handling DIE so we actually stop downloading and 
problems canceling active transfers, can't we just have the localizer forcibly 
tear down the JVM? If we're being told to DIE then I assume we really don't 
care about pending transfers completing and just want to get out. If the NM is 
going to clean up after the localizer anyway, seems like we can drastically 
simplify DIE handling and just exit the JVM. That seems like a change that's 
targeted enough to be appropriate for 2.7 instead of adding localizer kill 
support, etc.
In container localizer, when processing HB DIE response, we send another 
localizer status to NM. Is it really required ? What do you think ?
I think as soon as we get DIE, we can follow current code of cancelling pending 
tasks, although not wait for them to complete(as is being done in newly added 
code in patch) and  delete paths reported in last status. And then just return 
from the loop for a graceful shutdown(after stopping executors).
Or are you suggesting System exit ?

>From the NM side, we can have a deletion task after some configured delay(same 
>as right now). We will never cancel this deletion task though unlike code in 
>patch now.

This way localizer should quit quickly and NM can cleanup.
I will change the behavior of executor on deletion as well i.e. I will ignore 
missing paths by default. Wont add flag.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType

2015-09-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805098#comment-14805098
 ] 

Sunil G commented on YARN-4143:
---

Sorry, for point 1, I meant adding below snippet.
{code}
// The working knowledge is that masterContainer for AM is null as it
// itself is the master container.
RMAppAttempt appAttempt =
rmContext.getRMApps().get(applicationId).getCurrentAppAttempt();
return (appAttempt != null && appAttempt.getMasterContainer() == null
&& appAttempt.getSubmissionContext().getUnmanagedAM() == false);
{code}

> Optimize the check for AMContainer allocation needed by blacklisting and 
> ContainerType
> --
>
> Key: YARN-4143
> URL: https://issues.apache.org/jira/browse/YARN-4143
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4143.001.patch
>
>
> In YARN-2005 there are checks made to determine if the allocation is for an 
> AM container. This happens in every allocate call and should be optimized 
> away since it changes only once per SchedulerApplicationAttempt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType

2015-09-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805095#comment-14805095
 ] 

Sunil G commented on YARN-4143:
---

Hi [~adhoot]
Thank you for sharing the patch. 
Yes, this overall helps to avoid extra calls to RMAppAttempt to see AM 
container is launched.

I have couple of minor reasons for the alternative suggestion earlier.
1. It can avoid the schedulers contacting RMApp or RMAppAttempt if possible. 
Because the accessibility to get RMAppAttempt looks slightly trickier. but I am 
agreeing that this correct and used many places. 
{code}
RMAppAttempt appAttempt =
rmContext.getRMApps().get(applicationId).getCurrentAppAttempt();
{code}

In my opinion it will be good if RMAppAttempt can update 
{{SchedulerApplicationAttempt}}, it will look more clean. 
2. RMAppAttempt knows exactly when the AM Container is launched, so its good to 
update app at that time.

Yes, I know its not very strong reason, I mentioned it because it may looks 
more clean this way :). Also It adds an additional API to scheduler. We can 
definitely weigh the usefulness of this, and if its fine, then can be done. 
Thoughts?


> Optimize the check for AMContainer allocation needed by blacklisting and 
> ContainerType
> --
>
> Key: YARN-4143
> URL: https://issues.apache.org/jira/browse/YARN-4143
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4143.001.patch
>
>
> In YARN-2005 there are checks made to determine if the allocation is for an 
> AM container. This happens in every allocate call and should be optimized 
> away since it changes only once per SchedulerApplicationAttempt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-18 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4140:
---
Attachment: 0006-YARN-4140.patch

Attaching patch after update 

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This 

[jira] [Commented] (YARN-4113) RM should respect retry-interval when uses RetryPolicies.RETRY_FOREVER

2015-09-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805106#comment-14805106
 ] 

Hadoop QA commented on YARN-4113:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 27s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 11s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  0s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| | |  43m 27s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757219/0001-YARN-4113.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 723c31d |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9204/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9204/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9204/console |


This message was automatically generated.

> RM should respect retry-interval when uses RetryPolicies.RETRY_FOREVER
> --
>
> Key: YARN-4113
> URL: https://issues.apache.org/jira/browse/YARN-4113
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4113.patch
>
>
> Found one issue in RMProxy how to initialize RetryPolicy: In 
> RMProxy#createRetryPolicy. When rmConnectWaitMS is set to -1 (wait forever), 
> it uses RetryPolicies.RETRY_FOREVER which doesn't respect 
> {{yarn.resourcemanager.connect.retry-interval.ms}} setting.
> RetryPolicies.RETRY_FOREVER uses 0 as the interval, when I run the test 
> without properly setup localhost name: 
> {{TestYarnClient#testShouldNotRetryForeverForNonNetworkExceptions}}, it wrote 
> 14G DEBUG exception message to system before it dies. This will be very bad 
> if we do the same thing in a production cluster.
> We should fix two places:
> - Make RETRY_FOREVER can take retry-interval as constructor parameter.
> - Respect retry-interval when we uses RETRY_FOREVER policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4182) Killed containers disappear on app attempt page

2015-09-18 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-4182:
-
Attachment: 2015-09-18_1601.png

> Killed containers disappear on app attempt page
> ---
>
> Key: YARN-4182
> URL: https://issues.apache.org/jira/browse/YARN-4182
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Jeff Zhang
> Attachments: 2015-09-18_1601.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2597) MiniYARNCluster should propagate reason for AHS not starting

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805289#comment-14805289
 ] 

Hudson commented on YARN-2597:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #406 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/406/])
YARN-2597 MiniYARNCluster should propagate reason for AHS not starting (stevel: 
rev a7201d635fc45b169ca3326bad48a3f781efe604)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* hadoop-yarn-project/CHANGES.txt


> MiniYARNCluster should propagate reason for AHS not starting
> 
>
> Key: YARN-2597
> URL: https://issues.apache.org/jira/browse/YARN-2597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2597-001.patch
>
>
> If the AHS doesn't come up, your test run gets an exception telling you this 
> fact -but the underlying cause is not propagated.
> As YARN services do record their failure cause, extracting and propagating 
> this is trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can fail to shutdown

2015-09-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805084#comment-14805084
 ] 

zhihai xu commented on YARN-3697:
-

I also cherry-picked the change from trunk to branch-2.7, Since YARN-4153 was 
committed to branch-2.7. TestAsyncDispatcher succeeds now.

> FairScheduler: ContinuousSchedulingThread can fail to shutdown
> --
>
> Key: YARN-3697
> URL: https://issues.apache.org/jira/browse/YARN-3697
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-3697.000.patch, YARN-3697.001.patch
>
>
> FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
> sometimes. 
> The reason is because the InterruptedException is blocked in 
> continuousSchedulingAttempt
> {code}
>   try {
> if (node != null && Resources.fitsIn(minimumAllocation,
> node.getAvailableResource())) {
>   attemptScheduling(node);
> }
>   } catch (Throwable ex) {
> LOG.error("Error while attempting scheduling for node " + node +
> ": " + ex.toString(), ex);
>   }
> {code}
> I saw the following exception after stop:
> {code}
> 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
> fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
> Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
> available= used=: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
>   at 
> 

[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805190#comment-14805190
 ] 

Hudson commented on YARN-4135:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2327 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2327/])
YARN-4135. Improve the assertion message in MockRM while failing after waiting 
for the state.(Nijel S F via rohithsharmaks) (rohithsharmaks: rev 
723c31d45bc0f64b1d1a67350b108059d2a220a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt


> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3603) Application Attempts page confusing

2015-09-18 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805222#comment-14805222
 ] 

Jeff Zhang commented on YARN-3603:
--

[~sunilg] If it is "Running Container ID", is it still necessary to include 
"Container Exit Status" ?  And any reasons to exclude the killed containers 
here ? I think it would be helpful to diagnose if all the containers are 
included. 


> Application Attempts page confusing
> ---
>
> Key: YARN-3603
> URL: https://issues.apache.org/jira/browse/YARN-3603
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Thomas Graves
>Assignee: Sunil G
> Attachments: 0001-YARN-3603.patch, 0002-YARN-3603.patch, 
> 0003-YARN-3603.patch, ahs1.png
>
>
> The application attempts page 
> (http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01)
> is a bit confusing on what is going on.  I think the table of containers 
> there is for only Running containers and when the app is completed or killed 
> its empty.  The table should have a label on it stating so.  
> Also the "AM Container" field is a link when running but not when its killed. 
>  That might be confusing.
> There is no link to the logs in this page but there is in the app attempt 
> table when looking at http://
> rm:8088/cluster/app/application_1431101480046_0003



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805281#comment-14805281
 ] 

Varun Saxena commented on YARN-2902:


I should say "This way localizer should quit quickly after attempting some 
cleanup and NM can also attempt cleanup."

However we can also let localizer not do any cleanup at all and let NM delete 
paths. That is also an option.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4181) node blacklist for AM launching

2015-09-18 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4181:
--
Description: 
In some cases, a node goes problematic and most launching containers fail on 
this node, as well as the launching AM containers.
Then this node has more available resource than other nodes in the cluster. The 
Application whose AM is failing has zero minShareRatio. With fair scheduler, 
this node is always rated first, and the misfortune Application is also likely 
rated first. The result is:  attempts of the this application are failing again 
and again on the same node.

We should avoid such a deadlock situation.

Solution 1: NM could detect the failure rate of containers. If the rate is 
high, the NM marks itself to unhealthy for a period. But we should be careful 
not to turn all nodes into unhealthy by a buggy Application. Maybe use failure 
rate of containers for different Applications.

Solution 2: To have Application level blacklist by AMLauncher, in addition to 
existing blacklist by AM.

  was:
In some cases, a node goes problematic and most launching containers fail on 
this node, as well as the launching AM containers.
Then this node has more available resource than other nodes in the cluster. The 
Application whose AM is failing has zero minShareRatio. With fair scheduler, 
this node is always rated first, and the misfortune Application is also likely 
rated first. The result is:  attempts of the this application are failing again 
and again on the same node.

Solution 1: NM could detect the failure rate of containers. If the rate is 
high, the NM marks itself to unhealthy for a period. But we should be careful 
not to turn all nodes into unhealthy by a buggy Application. Maybe use failure 
rate of containers for different Applications.

Solution 2: To have Application level blacklist by AMLauncher, in addition to 
existing blacklist by AM.


> node blacklist for AM launching
> ---
>
> Key: YARN-4181
> URL: https://issues.apache.org/jira/browse/YARN-4181
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> In some cases, a node goes problematic and most launching containers fail on 
> this node, as well as the launching AM containers.
> Then this node has more available resource than other nodes in the cluster. 
> The Application whose AM is failing has zero minShareRatio. With fair 
> scheduler, this node is always rated first, and the misfortune Application is 
> also likely rated first. The result is:  attempts of the this application are 
> failing again and again on the same node.
> We should avoid such a deadlock situation.
> Solution 1: NM could detect the failure rate of containers. If the rate is 
> high, the NM marks itself to unhealthy for a period. But we should be careful 
> not to turn all nodes into unhealthy by a buggy Application. Maybe use 
> failure rate of containers for different Applications.
> Solution 2: To have Application level blacklist by AMLauncher, in addition to 
> existing blacklist by AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4182) Killed containers disappear on app attempt page

2015-09-18 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805183#comment-14805183
 ] 

Bibin A Chundatt commented on YARN-4182:


Its similar to YARN-3603 rt?

> Killed containers disappear on app attempt page
> ---
>
> Key: YARN-4182
> URL: https://issues.apache.org/jira/browse/YARN-4182
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Jeff Zhang
> Attachments: 2015-09-18_1601.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4177) yarn.util.Clock should not be used to time a duration or time interval

2015-09-18 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4177:
--
Attachment: YARN-4177.002.patch

Upload patch ver2.

> yarn.util.Clock should not be used to time a duration or time interval
> --
>
> Key: YARN-4177
> URL: https://issues.apache.org/jira/browse/YARN-4177
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xianyin Xin
> Attachments: YARN-4177.001.patch, YARN-4177.002.patch
>
>
> There're many places uses Clock to time intervals, which is dangerous as 
> commented by [~ste...@apache.org] in HADOOP-12409. Instead, we should use 
> hadoop.util.Timer#monotonicNow() to get monotonic time. Or we could provide a 
> MonotonicClock in yarn.util considering the consistency of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876207#comment-14876207
 ] 

Hudson commented on YARN-3212:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1151 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1151/])
YARN-3212. RMNode State Transition Update with DECOMMISSIONING state. (Junping 
Du via wangda) (wangda: rev 9bc913a35c46e65d373c3ae3f01a377e16e8d0ca)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java


> RMNode State Transition Update with DECOMMISSIONING state
> -
>
> Key: YARN-3212
> URL: https://issues.apache.org/jira/browse/YARN-3212
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, 
> YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch, 
> YARN-3212-v6.1.patch, YARN-3212-v6.2.patch, YARN-3212-v6.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous 
> decommission by calling recommission on the same node. The reaction to other 
> events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3975) WebAppProxyServlet should not redirect to RM page if AHS is enabled

2015-09-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876238#comment-14876238
 ] 

Jason Lowe commented on YARN-3975:
--

Thanks for updating the patch, Mit!  Looks good overall with just two nits left.

The error message for an app that's not found now has an extraneous comma as it 
will now say, "Application application_1234_1 could not be found, in RM or 
history server".

We should probably throw an UnsupportedOperationException or something similar 
when the app report source isn't RM or AHS.  Shouldn't happen now but could if 
someone comes along and adds a new enum value and forgets to update this code.  
If it does happen the code will just do nothing as-is.

> WebAppProxyServlet should not redirect to RM page if AHS is enabled
> ---
>
> Key: YARN-3975
> URL: https://issues.apache.org/jira/browse/YARN-3975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-3975.2.b2.patch, YARN-3975.3.patch, 
> YARN-3975.4.patch, YARN-3975.5.patch, YARN-3975.6.patch, YARN-3975.7.patch
>
>
> WebAppProxyServlet should be updated to handle the case when the appreport 
> doesn't have a tracking URL and the Application History Server is eanbled.
> As we would have already tried the RM and got the 
> ApplicationNotFoundException we should not direct the user to the RM app page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876268#comment-14876268
 ] 

Hadoop QA commented on YARN-4180:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m 26s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761162/YARN-4180.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 88d89267 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9215/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9215/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9215/console |


This message was automatically generated.

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4163) Audit getQueueInfo and getApplications calls

2015-09-18 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876041#comment-14876041
 ] 

Chang Li commented on YARN-4163:


broken unit tests are unrelated.

> Audit getQueueInfo and getApplications calls
> 
>
> Key: YARN-4163
> URL: https://issues.apache.org/jira/browse/YARN-4163
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4163.2.patch, YARN-4163.patch
>
>
> getQueueInfo and getApplications seem to sometimes cause spike of load but 
> not able to confirm due to they are not audit logged. This patch propose to 
> add them to audit log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876042#comment-14876042
 ] 

Hudson commented on YARN-3212:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #418 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/418/])
YARN-3212. RMNode State Transition Update with DECOMMISSIONING state. (Junping 
Du via wangda) (wangda: rev 9bc913a35c46e65d373c3ae3f01a377e16e8d0ca)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* hadoop-yarn-project/CHANGES.txt


> RMNode State Transition Update with DECOMMISSIONING state
> -
>
> Key: YARN-3212
> URL: https://issues.apache.org/jira/browse/YARN-3212
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, 
> YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch, 
> YARN-3212-v6.1.patch, YARN-3212-v6.2.patch, YARN-3212-v6.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous 
> decommission by calling recommission on the same node. The reaction to other 
> events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4186) Make WebAppUtils a public API for yarn

2015-09-18 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-4186:
-

 Summary: Make WebAppUtils a public API for yarn
 Key: YARN-4186
 URL: https://issues.apache.org/jira/browse/YARN-4186
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles


Application types like Tez will want to expose AM container log location to 
users without having to make a call to the RM.

Exposing getRunningLogURL as public will provide this functionality.

Jon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-09-18 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-4185:
---

 Summary: Retry interval delay for NM client can be improved from 
the fixed static retry 
 Key: YARN-4185
 URL: https://issues.apache.org/jira/browse/YARN-4185
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


Instead of having a fixed retry interval that starts off very high and stays 
there, we are better off using an exponential backoff that has the same fixed 
max limit. Today the retry interval is fixed at 10 sec that can be 
unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-09-18 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3985:

Attachment: YARN-3985.003.patch

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876371#comment-14876371
 ] 

Hudson commented on YARN-3212:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #392 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/392/])
YARN-3212. RMNode State Transition Update with DECOMMISSIONING state. (Junping 
Du via wangda) (wangda: rev 9bc913a35c46e65d373c3ae3f01a377e16e8d0ca)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java


> RMNode State Transition Update with DECOMMISSIONING state
> -
>
> Key: YARN-3212
> URL: https://issues.apache.org/jira/browse/YARN-3212
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, 
> YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch, 
> YARN-3212-v6.1.patch, YARN-3212-v6.2.patch, YARN-3212-v6.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous 
> decommission by calling recommission on the same node. The reaction to other 
> events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876194#comment-14876194
 ] 

Hudson commented on YARN-3212:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2357 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2357/])
YARN-3212. RMNode State Transition Update with DECOMMISSIONING state. (Junping 
Du via wangda) (wangda: rev 9bc913a35c46e65d373c3ae3f01a377e16e8d0ca)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java


> RMNode State Transition Update with DECOMMISSIONING state
> -
>
> Key: YARN-3212
> URL: https://issues.apache.org/jira/browse/YARN-3212
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, 
> YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch, 
> YARN-3212-v6.1.patch, YARN-3212-v6.2.patch, YARN-3212-v6.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous 
> decommission by calling recommission on the same node. The reaction to other 
> events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-09-18 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876038#comment-14876038
 ] 

Anubhav Dhoot commented on YARN-2005:
-

Thanks [~jianhe], [~sunilg], [~jlowe] for the reviews and [~kasha]  for the 
review and commit !

> Blacklisting support for scheduling AMs
> ---
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Jason Lowe
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
> YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, 
> YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, 
> YARN-2005.008.patch, YARN-2005.009.patch
>
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4157) Merge YARN-1197 back to trunk

2015-09-18 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4157:
-
Attachment: YARN-1197.diff.5.patch

Updated to latest diff. (ver.5)

> Merge YARN-1197 back to trunk
> -
>
> Key: YARN-4157
> URL: https://issues.apache.org/jira/browse/YARN-4157
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1197.diff.1.patch, YARN-1197.diff.2.patch, 
> YARN-1197.diff.3.patch, YARN-1197.diff.4.patch, YARN-1197.diff.5.patch
>
>
> The purpose of this jira is to generate a uber patch from current YARN-1197 
> branch and run against trunk to fix any uncaught warnings and test failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-18 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876185#comment-14876185
 ] 

Vrushali C commented on YARN-4074:
--


Patch v8 looks good too me. Thanks for updating the test cases to use the 
reader objects. 
+1

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.007.patch, 
> YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, 
> YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, 
> YARN-4074-YARN-2928.POC.006.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4170) AM need to be notified with priority in AllocateResponse

2015-09-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876227#comment-14876227
 ] 

Hadoop QA commented on YARN-4170:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 58s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 21s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 10s | The applied patch generated  1 
new checkstyle issues (total was 11, now 12). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 22s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m 11s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  56m  2s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 111m 42s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-common |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757128/0001-YARN-4170.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 88d89267 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9212/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9212/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9212/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9212/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9212/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9212/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9212/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9212/console |


This message was automatically generated.

> AM need to be notified with priority in AllocateResponse 
> -
>
> Key: YARN-4170
> URL: https://issues.apache.org/jira/browse/YARN-4170
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4170.patch
>
>
> As discussed in MAPREDUCE-5870, Application Master need to be notified with 
> priority in Allocate heartbeat.  This will help AM to know the priority and 
> can update JobStatus when client asks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4157) Merge YARN-1197 back to trunk

2015-09-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876829#comment-14876829
 ] 

Hadoop QA commented on YARN-4157:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  28m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 59 new or modified test files. |
| {color:green}+1{color} | javac |   9m 14s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 30s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   4m  8s | The applied patch generated  7 
new checkstyle issues (total was 29, now 27). |
| {color:red}-1{color} | whitespace | 294m 22s | The patch has 180  line(s) 
that end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 56s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |  11m  8s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |  10m 17s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | tools/hadoop tests |   0m 58s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 57s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   8m 35s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  61m 56s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 454m 18s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
| Timed out tests | 
org.apache.hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761161/YARN-1197.diff.5.patch 
|
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 88d89267 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9214/console |


This message was automatically generated.

> Merge YARN-1197 back to trunk
> -
>
> Key: YARN-4157
> URL: https://issues.apache.org/jira/browse/YARN-4157
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: 

[jira] [Commented] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876859#comment-14876859
 ] 

Hudson commented on YARN-3920:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #394 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/394/])
YARN-3920. FairScheduler container reservation on a node should be configurable 
to limit it to large containers (adhoot via asuresh) (Arun Suresh: rev 
94dec5a9164cd9bc573fbf74e76bcff9e7c5c637)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4189) Capacity Scheduler : Improve location preference waiting mechanism

2015-09-18 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876871#comment-14876871
 ] 

Xianyin Xin commented on YARN-4189:
---

Hi [~leftnoteasy], just go through the doc. Please correct me if i am wrong. 
Can a container that marked as ALLOCATING_WAITING be occupied by other 
requests? I'm afraid ALLOCATING_WAITING would decline the utilization of the 
cluster. In a cluster with many nodes and many jobs, it's uneasy to make the 
most of jobs satisfied with their allocations, especially in the app-oriented 
allocation mechanism(which maps the newly available resource to appropriate 
apps). Once a customer asks us "why we can get 100% locality in MR1 but only up 
to 60%~70% after making various of optimizations?". So we can guess there are 
quite a percents of resource requests which are not satisfied with their 
allocations in a cluster, thus there would be many containers experience the  
ALLOCATING_WAITING phase which makes many resource idle for a period of time.

> Capacity Scheduler : Improve location preference waiting mechanism
> --
>
> Key: YARN-4189
> URL: https://issues.apache.org/jira/browse/YARN-4189
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4189 design v1.pdf
>
>
> There're some issues with current Capacity Scheduler implementation of delay 
> scheduling:
> *1) Waiting time to allocate each container highly depends on cluster 
> availability*
> Currently, app can only increase missed-opportunity when a node has available 
> resource AND it gets traversed by a scheduler. There’re lots of possibilities 
> that an app doesn’t get traversed by a scheduler, for example:
> A cluster has 2 racks (rack1/2), each rack has 40 nodes. 
> Node-locality-delay=40. An application prefers rack1. 
> Node-heartbeat-interval=1s.
> Assume there are 2 nodes available on rack1, delay to allocate one container 
> = 40 sec.
> If there are 20 nodes available on rack1, delay of allocating one container = 
> 2 sec.
> *2) It could violate scheduling policies (Fifo/Priority/Fair)*
> Assume a cluster is highly utilized, an app (app1) has higher priority, it 
> wants locality. And there’s another app (app2) has lower priority, but it 
> doesn’t care about locality. When node heartbeats with available resource, 
> app1 decides to wait, so app2 gets the available slot. This should be 
> considered as a bug that we need to fix.
> The same problem could happen when we use FIFO/Fair queue policies.
> Another problem similar to this is related to preemption: when preemption 
> policy preempts some resources from queue-A for queue-B (queue-A is 
> over-satisfied and queue-B is under-satisfied). But queue-B is waiting for 
> the node-locality-delay so queue-A will get resources back. In next round, 
> preemption policy could preempt this resources again from queue-A.
> This JIRA is target to solve these problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876834#comment-14876834
 ] 

Hudson commented on YARN-3920:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2332 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2332/])
YARN-3920. FairScheduler container reservation on a node should be configurable 
to limit it to large containers (adhoot via asuresh) (Arun Suresh: rev 
94dec5a9164cd9bc573fbf74e76bcff9e7c5c637)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3920:
--
Summary: FairScheduler container reservation on a node should be 
configurable to limit it to large containers  (was: FairScheduler Reserving a 
node for a container should be configurable to allow it used only for large 
containers)

> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876439#comment-14876439
 ] 

Hudson commented on YARN-3920:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8489 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8489/])
YARN-3920. FairScheduler container reservation on a node should be configurable 
to limit it to large containers (adhoot via asuresh) (Arun Suresh: rev 
94dec5a9164cd9bc573fbf74e76bcff9e7c5c637)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876621#comment-14876621
 ] 

Hudson commented on YARN-3920:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1153 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1153/])
YARN-3920. FairScheduler container reservation on a node should be configurable 
to limit it to large containers (adhoot via asuresh) (Arun Suresh: rev 
94dec5a9164cd9bc573fbf74e76bcff9e7c5c637)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java


> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876661#comment-14876661
 ] 

Hudson commented on YARN-3212:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2331 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2331/])
YARN-3212. RMNode State Transition Update with DECOMMISSIONING state. (Junping 
Du via wangda) (wangda: rev 9bc913a35c46e65d373c3ae3f01a377e16e8d0ca)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* hadoop-yarn-project/CHANGES.txt


> RMNode State Transition Update with DECOMMISSIONING state
> -
>
> Key: YARN-3212
> URL: https://issues.apache.org/jira/browse/YARN-3212
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, 
> YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch, 
> YARN-3212-v6.1.patch, YARN-3212-v6.2.patch, YARN-3212-v6.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous 
> decommission by calling recommission on the same node. The reaction to other 
> events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-18 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876669#comment-14876669
 ] 

Anubhav Dhoot commented on YARN-4131:
-

Can you please add [~kasha] and me to this? We are interested in this effort.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876658#comment-14876658
 ] 

Hudson commented on YARN-3920:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #421 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/421/])
YARN-3920. FairScheduler container reservation on a node should be configurable 
to limit it to large containers (adhoot via asuresh) (Arun Suresh: rev 
94dec5a9164cd9bc573fbf74e76bcff9e7c5c637)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-18 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876583#comment-14876583
 ] 

Robert Kanter commented on YARN-4180:
-

Looks good.  Two minor things:
- Can you look into the test failure to see if it's related
- Instead of the {{// Exposed for testing}} comment, you can put 
{{@VisibleForTesting}}


> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876408#comment-14876408
 ] 

Arun Suresh commented on YARN-3920:
---

Committed to trunk and branch-2.

> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876473#comment-14876473
 ] 

Anubhav Dhoot commented on YARN-3920:
-

Thx [~asuresh] for review and commit!

> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType

2015-09-18 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876494#comment-14876494
 ] 

Anubhav Dhoot commented on YARN-4143:
-

Cannot think of an API to add to the scheduler that will be called by 
RMAppAttempt. We can add an event to SchedulerEventType such as 
AM_CONTAINER_ALLOCATED that happens in the 2 places you mentioned. Seems 
overkill to me.  Lemme know if you have any alternate ways of doing this.

> Optimize the check for AMContainer allocation needed by blacklisting and 
> ContainerType
> --
>
> Key: YARN-4143
> URL: https://issues.apache.org/jira/browse/YARN-4143
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4143.001.patch
>
>
> In YARN-2005 there are checks made to determine if the allocation is for an 
> AM container. This happens in every allocate call and should be optimized 
> away since it changes only once per SchedulerApplicationAttempt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler container reservation on a node should be configurable to limit it to large containers

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876487#comment-14876487
 ] 

Hudson commented on YARN-3920:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #413 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/413/])
YARN-3920. FairScheduler container reservation on a node should be configurable 
to limit it to large containers (adhoot via asuresh) (Arun Suresh: rev 
94dec5a9164cd9bc573fbf74e76bcff9e7c5c637)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler container reservation on a node should be configurable to limit 
> it to large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, YARN-3920.005.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

2015-09-18 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4044:
--
Attachment: 0005-YARN-4044.patch

Thank you. 
Uploading a new patch addressing test comments.

> Running applications information changes such as movequeue is not published 
> to TimeLine server
> --
>
> Key: YARN-4044
> URL: https://issues.apache.org/jira/browse/YARN-4044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch, 
> 0003-YARN-4044.patch, 0004-YARN-4044.patch, 0005-YARN-4044.patch
>
>
> SystemMetricsPublisher need to expose an appUpdated api to update any change 
> for a running application.
> Events can be 
>   - change of queue for a running application.
> - change of application priority for a running application.
> This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3603) Application Attempts page confusing

2015-09-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14847294#comment-14847294
 ] 

Sunil G commented on YARN-3603:
---

Hi [~jeffzhang]
Thank you for pointing this. {{Container Exit Status}} must not be present in 
AppAttempt page. I will remove the same and upload a patch.

For RM WebUI, we are using {{ClientRMService}} alone, and ATS/AHS is not 
contacted here to get container info. Also RM keeps tracks of running 
containers here and the idea is to keep information about completed containers 
in ATS/AHS UI. When application is finished/running, ATS UI is capable of 
giving detailed information about the application (containers info also). 

> Application Attempts page confusing
> ---
>
> Key: YARN-3603
> URL: https://issues.apache.org/jira/browse/YARN-3603
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Thomas Graves
>Assignee: Sunil G
> Attachments: 0001-YARN-3603.patch, 0002-YARN-3603.patch, 
> 0003-YARN-3603.patch, ahs1.png
>
>
> The application attempts page 
> (http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01)
> is a bit confusing on what is going on.  I think the table of containers 
> there is for only Running containers and when the app is completed or killed 
> its empty.  The table should have a label on it stating so.  
> Also the "AM Container" field is a link when running but not when its killed. 
>  That might be confusing.
> There is no link to the logs in this page but there is in the app attempt 
> table when looking at http://
> rm:8088/cluster/app/application_1431101480046_0003



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

2015-09-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14852731#comment-14852731
 ] 

Hadoop QA commented on YARN-4044:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 27s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 27s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 48s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 45s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 56s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   4m  5s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  60m 29s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 115m 35s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12758372/0005-YARN-4044.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a7201d6 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9207/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9207/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9207/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9207/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9207/console |


This message was automatically generated.

> Running applications information changes such as movequeue is not published 
> to TimeLine server
> --
>
> Key: YARN-4044
> URL: https://issues.apache.org/jira/browse/YARN-4044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch, 
> 0003-YARN-4044.patch, 0004-YARN-4044.patch, 0005-YARN-4044.patch
>
>
> SystemMetricsPublisher need to expose an appUpdated api to update any change 
> for a running application.
> Events can be 
>   - change of queue for a running application.
> - change of application priority for a running application.
> This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2597) MiniYARNCluster should propagate reason for AHS not starting

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805365#comment-14805365
 ] 

Hudson commented on YARN-2597:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1147 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1147/])
YARN-2597 MiniYARNCluster should propagate reason for AHS not starting (stevel: 
rev a7201d635fc45b169ca3326bad48a3f781efe604)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


> MiniYARNCluster should propagate reason for AHS not starting
> 
>
> Key: YARN-2597
> URL: https://issues.apache.org/jira/browse/YARN-2597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2597-001.patch
>
>
> If the AHS doesn't come up, your test run gets an exception telling you this 
> fact -but the underlying cause is not propagated.
> As YARN services do record their failure cause, extracting and propagating 
> this is trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2597) MiniYARNCluster should propagate reason for AHS not starting

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14847320#comment-14847320
 ] 

Hudson commented on YARN-2597:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2353 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2353/])
YARN-2597 MiniYARNCluster should propagate reason for AHS not starting (stevel: 
rev a7201d635fc45b169ca3326bad48a3f781efe604)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* hadoop-yarn-project/CHANGES.txt


> MiniYARNCluster should propagate reason for AHS not starting
> 
>
> Key: YARN-2597
> URL: https://issues.apache.org/jira/browse/YARN-2597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2597-001.patch
>
>
> If the AHS doesn't come up, your test run gets an exception telling you this 
> fact -but the underlying cause is not propagated.
> As YARN services do record their failure cause, extracting and propagating 
> this is trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2597) MiniYARNCluster should propagate reason for AHS not starting

2015-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14852719#comment-14852719
 ] 

Hudson commented on YARN-2597:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #389 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/389/])
YARN-2597 MiniYARNCluster should propagate reason for AHS not starting (stevel: 
rev a7201d635fc45b169ca3326bad48a3f781efe604)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


> MiniYARNCluster should propagate reason for AHS not starting
> 
>
> Key: YARN-2597
> URL: https://issues.apache.org/jira/browse/YARN-2597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2597-001.patch
>
>
> If the AHS doesn't come up, your test run gets an exception telling you this 
> fact -but the underlying cause is not propagated.
> As YARN services do record their failure cause, extracting and propagating 
> this is trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4181) node blacklist for AM launching

2015-09-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14875727#comment-14875727
 ] 

Jason Lowe commented on YARN-4181:
--

Dup of YARN-2005?

> node blacklist for AM launching
> ---
>
> Key: YARN-4181
> URL: https://issues.apache.org/jira/browse/YARN-4181
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> In some cases, a node goes problematic and most launching containers fail on 
> this node, as well as the launching AM containers.
> Then this node has more available resource than other nodes in the cluster. 
> The Application whose AM is failing has zero minShareRatio. With fair 
> scheduler, this node is always rated first, and the misfortune Application is 
> also likely rated first. The result is:  attempts of the this application are 
> failing again and again on the same node.
> We should avoid such a deadlock situation.
> Solution 1: NM could detect the failure rate of containers. If the rate is 
> high, the NM marks itself to unhealthy for a period. But we should be careful 
> not to turn all nodes into unhealthy by a buggy Application. Maybe use 
> failure rate of containers for different Applications.
> Solution 2: To have Application level blacklist by AMLauncher, in addition to 
> existing blacklist by AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4177) yarn.util.Clock should not be used to time a duration or time interval

2015-09-18 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin reassigned YARN-4177:
-

Assignee: Xianyin Xin

> yarn.util.Clock should not be used to time a duration or time interval
> --
>
> Key: YARN-4177
> URL: https://issues.apache.org/jira/browse/YARN-4177
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4177.001.patch, YARN-4177.002.patch
>
>
> There're many places uses Clock to time intervals, which is dangerous as 
> commented by [~ste...@apache.org] in HADOOP-12409. Instead, we should use 
> hadoop.util.Timer#monotonicNow() to get monotonic time. Or we could provide a 
> MonotonicClock in yarn.util considering the consistency of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-09-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3816:
-
Attachment: YARN-3816-YARN-2928-v4.patch

Update the patch according to latest patch checked in YARN-2928 branch.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-09-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14875721#comment-14875721
 ] 

Jason Lowe commented on YARN-2902:
--

bq. In container localizer, when processing HB DIE response, we send another 
localizer status to NM. Is it really required ? What do you think ?

I don't think this is required.  If the NM is telling the localizer to DIE then 
I don't think the NM cares after that point what the localizer is doing.  The 
NM is totally done with it at that point due to failure or lack of knowledge of 
that localizer.

bq. Or are you suggesting System exit ?
I was basically suggesting System.exit if we aren't convinced that the 
localizer can actually tear down in a timely manner.  For example, if the 
graceful shutdown could involve waiting for active transfers to complete 
because we can't reliably interrupt them, then yes I think System.exit is 
appropriate.  A good compromise would be to put a timeout on shutdown -- if we 
can't get down within so many seconds then have something (e.g.: a watchdog 
thread if necessary) call System.exit to get out.  Otherwise the localizer 
could still be running and messing with the filesystem after the NM tries to 
cleanup afterwards.

Worst-case scenario is this could still happen even with these fixes, but it 
should resolve the leaking issue for the vast majority of cases.  We can make 
it more bulletproof in a followup JIRA for 2.8 or later that actually has the 
NM tracking localizer pids and proactively killing them if they don't respond 
in a timely manner to commands.

bq. However we can also let localizer not do any cleanup at all and let NM 
delete paths.
I would still like the localizer to try to perform some cleanup if possible, as 
the NM doesn't track localizers in the state store.  Therefore if the NM 
restarts we may not cleanup everything properly if the localizer doesn't do it 
on its own.


> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS

2015-09-18 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14875732#comment-14875732
 ] 

Jonathan Eagles commented on YARN-2513:
---

[~xgong], [~zjshen], will you have time to review this patch?

> Host framework UIs in YARN for use with the ATS
> ---
>
> Key: YARN-2513
> URL: https://issues.apache.org/jira/browse/YARN-2513
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>  Labels: 2.6.1-candidate
> Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, 
> YARN-2513.v3.patch, YARN-2513.v4.patch, YARN-2513.v5.patch
>
>
> Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
> infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-09-18 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138-YARN-1197.2.patch

Submit an updated patch that includes extensive test cases.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >