[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997621#comment-13997621
 ] 

Wangda Tan commented on YARN-2053:
----------------------------------

Took a look at related code, I think this problem is caused by,

In ApplicationMasterService.registerApplicationMaster(), it will add nmTokens 
from previous attempt's container via a loop.
{code}
      List<Container> transferredContainers =
          ((AbstractYarnScheduler) rScheduler)
            .getTransferredContainers(applicationAttemptId);
      if (!transferredContainers.isEmpty()) {
        response.setContainersFromPreviousAttempts(transferredContainers);
        List<NMToken> nmTokens = new ArrayList<NMToken>();
        for (Container container : transferredContainers) {
          try {
            nmTokens.add(rmContext.getNMTokenSecretManager()
                .createAndGetNMToken(app.getUser(), applicationAttemptId,
                    container););
          }
{code}

And NMTokenSecretManager.createAndGetNMToken()
{code}
      NMToken nmToken = null;
      if (nodeSet != null) {
        if (!nodeSet.contains(container.getNodeId())) {
           ...
           // set nmToken
           ...
        }
      }
      return nmToken
{code}

So if multiple container come from same NM (with same NodeId), null nmToken 
will be added to NMToken list. And in 
RegisterApplicationMasterResponsePBImpl.getTokenProtoIterable, it tried to 
convert a null NMToken to proto
{code}
          @Override
          public NMTokenProto next() {
            return convertToProtoFormat(iter.next());
          }
{code}

I think this should be the root cause of this problem, uploaded a patch.

> Slider AM fails to restart: NPE in 
> RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2053
>                 URL: https://issues.apache.org/jira/browse/YARN-2053
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Sumit Mohanty
>         Attachments: yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
> yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak
>
>
> Slider AppMaster restart fails with the following:
> {code}
> org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to