Wangda Tan commented on YARN-2053:

Took a look at related code, I think this problem is caused by,

In ApplicationMasterService.registerApplicationMaster(), it will add nmTokens 
from previous attempt's container via a loop.
      List<Container> transferredContainers =
          ((AbstractYarnScheduler) rScheduler)
      if (!transferredContainers.isEmpty()) {
        List<NMToken> nmTokens = new ArrayList<NMToken>();
        for (Container container : transferredContainers) {
          try {
                .createAndGetNMToken(app.getUser(), applicationAttemptId,

And NMTokenSecretManager.createAndGetNMToken()
      NMToken nmToken = null;
      if (nodeSet != null) {
        if (!nodeSet.contains(container.getNodeId())) {
           // set nmToken
      return nmToken

So if multiple container come from same NM (with same NodeId), null nmToken 
will be added to NMToken list. And in 
RegisterApplicationMasterResponsePBImpl.getTokenProtoIterable, it tried to 
convert a null NMToken to proto
          public NMTokenProto next() {
            return convertToProtoFormat(iter.next());

I think this should be the root cause of this problem, uploaded a patch.

> Slider AM fails to restart: NPE in 
> RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
> --------------------------------------------------------------------------------------------------------------------
>                 Key: YARN-2053
>                 URL: https://issues.apache.org/jira/browse/YARN-2053
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Sumit Mohanty
>         Attachments: yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
> yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak
> Slider AppMaster restart fails with the following:
> {code}
> org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
> {code}

This message was sent by Atlassian JIRA

Reply via email to