[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002937#comment-14002937
 ] 

Hudson commented on YARN-2053:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5606 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5606/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003257#comment-14003257
 ] 

Hudson commented on YARN-2053:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #562 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/562/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003292#comment-14003292
 ] 

Hudson commented on YARN-2053:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1754 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1754/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003469#comment-14003469
 ] 

Hudson commented on YARN-2053:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1780 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/])
YARN-2053. Fixed a bug in AMS to not add null NMToken into NMTokens list from 
previous attempts for work-preserving AM restart. Contributed by Wangda Tan 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595116)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998948#comment-13998948
 ] 

Jian He commented on YARN-2053:
---

LGTM, +1, submit the same patch to kick jenkins

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999533#comment-13999533
 ] 

Hadoop QA commented on YARN-2053:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12645047/YARN-2053.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3751//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3751//console

This message is automatically generated.

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-2053.patch, YARN-2053.patch, YARN-2053.patch, 
 YARN-2053.patch, YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998324#comment-13998324
 ] 

Wangda Tan commented on YARN-2053:
--

Sure, I'll do that, thanks for review!

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Attachments: YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997623#comment-13997623
 ] 

Wangda Tan commented on YARN-2053:
--

And I think this should be marked as critical or blocker bug, agree?

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Attachments: YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart

2014-05-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997461#comment-13997461
 ] 

Steve Loughran commented on YARN-2053:
--

{code}
{noformat}
14/05/10 17:02:17 INFO appmaster.SliderAppMaster: Connecting to RM at 
48058,address tracking URL=http://c6403.ambari.apache.org:48705
14/05/10 17:02:17 ERROR main.ServiceLauncher: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.convertToProtoFormat(RegisterApplicationMasterResponsePBImpl.java:384)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.access$100(RegisterApplicationMasterResponsePBImpl.java:53)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:355)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:344)
at 
com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
at 
org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToBuilder(RegisterApplicationMasterResponsePBImpl.java:123)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToProto(RegisterApplicationMasterResponsePBImpl.java:104)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.getProto(RegisterApplicationMasterResponsePBImpl.java:75)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:91)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

Exception: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.convertToProtoFormat(RegisterApplicationMasterResponsePBImpl.java:384)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.access$100(RegisterApplicationMasterResponsePBImpl.java:53)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:355)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:344)
at 
com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
at 
org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToBuilder(RegisterApplicationMasterResponsePBImpl.java:123)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToProto(RegisterApplicationMasterResponsePBImpl.java:104)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.getProto(RegisterApplicationMasterResponsePBImpl.java:75)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:91)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at 

[jira] [Commented] (YARN-2053) Slider AM fails to restart

2014-05-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997459#comment-13997459
 ] 

Steve Loughran commented on YARN-2053:
--

Note that this AM requests container retention over AM restarts -so is testing 
code paths that not much (anything?) else is testing

 Slider AM fails to restart
 --

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
 Attachments: yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {noformat}
 14/05/10 17:02:17 INFO appmaster.SliderAppMaster: Connecting to RM at 
 48058,address tracking URL=http://c6403.ambari.apache.org:48705
 14/05/10 17:02:17 ERROR main.ServiceLauncher: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.convertToProtoFormat(RegisterApplicationMasterResponsePBImpl.java:384)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.access$100(RegisterApplicationMasterResponsePBImpl.java:53)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:355)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:344)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
 at 
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToBuilder(RegisterApplicationMasterResponsePBImpl.java:123)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToProto(RegisterApplicationMasterResponsePBImpl.java:104)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.getProto(RegisterApplicationMasterResponsePBImpl.java:75)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:91)
 at 
 org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 Exception: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.convertToProtoFormat(RegisterApplicationMasterResponsePBImpl.java:384)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.access$100(RegisterApplicationMasterResponsePBImpl.java:53)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:355)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl$2$1.next(RegisterApplicationMasterResponsePBImpl.java:344)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
 at 
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterResponsePBImpl.mergeLocalToBuilder(RegisterApplicationMasterResponsePBImpl.java:123)
 at 
 

[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997621#comment-13997621
 ] 

Wangda Tan commented on YARN-2053:
--

Took a look at related code, I think this problem is caused by,

In ApplicationMasterService.registerApplicationMaster(), it will add nmTokens 
from previous attempt's container via a loop.
{code}
  ListContainer transferredContainers =
  ((AbstractYarnScheduler) rScheduler)
.getTransferredContainers(applicationAttemptId);
  if (!transferredContainers.isEmpty()) {
response.setContainersFromPreviousAttempts(transferredContainers);
ListNMToken nmTokens = new ArrayListNMToken();
for (Container container : transferredContainers) {
  try {
nmTokens.add(rmContext.getNMTokenSecretManager()
.createAndGetNMToken(app.getUser(), applicationAttemptId,
container););
  }
{code}

And NMTokenSecretManager.createAndGetNMToken()
{code}
  NMToken nmToken = null;
  if (nodeSet != null) {
if (!nodeSet.contains(container.getNodeId())) {
   ...
   // set nmToken
   ...
}
  }
  return nmToken
{code}

So if multiple container come from same NM (with same NodeId), null nmToken 
will be added to NMToken list. And in 
RegisterApplicationMasterResponsePBImpl.getTokenProtoIterable, it tried to 
convert a null NMToken to proto
{code}
  @Override
  public NMTokenProto next() {
return convertToProtoFormat(iter.next());
  }
{code}

I think this should be the root cause of this problem, uploaded a patch.

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
 Attachments: yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997804#comment-13997804
 ] 

Jian He commented on YARN-2053:
---

Thanks for catching this !   Patch looks good to me, Wangda, can you add a test 
case ?
 Basically, allocate 2 containers on the same node in 
TestAMRestart#testNMTokensRebindOnAMRestart should be enough.

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Attachments: YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)