[
https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617373#comment-13617373
]
Robert Joseph Evans commented on YARN-515:
------------------------------------------
This issue appears to be cause by a bug in RegisterNodeManagerResponsePBImpl.
I think specifically it was caused by YARN-440. I have a unit test that can
reproduce it. Sid reviewed YARN-440 and he is a really smart guy. I looked at
it thinking that it must be the cause of the issue and I didn't see anything in
there that was off.
I just think all this extra code to try and wrap the protocol buffers is just a
bad idea. It makes things difficult to change a .proto file, and it just slows
things down. But it is a lot of work to change it so I am done with my rant
now, I'll go find a fix for the issue.
> Node Manager not getting the master key
> ---------------------------------------
>
> Key: YARN-515
> URL: https://issues.apache.org/jira/browse/YARN-515
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.0.4-alpha
> Reporter: Robert Joseph Evans
> Priority: Blocker
>
> On branch-2 the latest version I see the following on a secure cluster.
> {noformat}
> 2013-03-28 19:21:06,243 [main] INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security
> enabled - updating secret keys now
> 2013-03-28 19:21:06,243 [main] INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered
> with ResourceManager as RM:PORT with total resource of <me
> mory:12288, vCores:16>
> 2013-03-28 19:21:06,244 [main] INFO
> org.apache.hadoop.yarn.service.AbstractService:
> Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is
> started.
> 2013-03-28 19:21:06,245 [main] INFO
> org.apache.hadoop.yarn.service.AbstractService:
> Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
> 2013-03-28 19:21:07,257 [Node Status Updater] ERROR
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught
> exception in status-updater
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407)
> {noformat}
> The Null pointer exception just keeps repeating and all of the nodes end up
> being lost. It looks like it never gets the secret key when it registers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira