[ 
https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911149#comment-13911149
 ] 

Jian He commented on YARN-1588:
-------------------------------

bq. Not sure why we are undoing the locking in NMToken PBImpl
That was to fix the inconsistent synchronization issue, moved the equals and 
hashCode to the NMToken base class.

bq. we are not handling DNS failure when generating NMTokens.
Tx for pointing out. UnkownHostException turns out to be a kind of IOException 
and that can be retried by RMProxy in RPC layer.  The patch made the change to 
throw UnkownHostException directly and expect to be retried within RPC layer,  
and so AM itself is unknown about the exception.

Changed AMRMClient to populate NMtoken from previous attempts into NMTokenCache 
so that it works in secure cluster.


Testing on secure cluster:
- Code change: Changed distributed shell to immediately call 
nmClientAnsyc.getContainerStatus for each transferred container after it get 
the containers from previous attempts.
{code}
List<Container> previousAMRunningContainers =
        response.getContainersFromPreviousAttempts();
    LOG.info("Received " + previousAMRunningContainers.size()
        + " previous AM's running containers on AM registration.");
    numAllocatedContainers.addAndGet(previousAMRunningContainers.size());
    for ( Container container: previousAMRunningContainers) {
      nmClientAsync.getContainerStatusAsync(container.getId(), 
container.getNodeId());
    }
{code}
- Started the distributed shell with a long sleep command.
- kill the ApplicationMaster.
- New AM started and after it getContainersFromPreviousAttempts, it will call 
getContainerStatus for each transferred container and so the transferred 
NMToken should be used to talk with the corresponding NM.


> Rebind NM tokens for previous attempt's running containers to the new attempt
> -----------------------------------------------------------------------------
>
>                 Key: YARN-1588
>                 URL: https://issues.apache.org/jira/browse/YARN-1588
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, 
> YARN-1588.3.patch, YARN-1588.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to