[
https://issues.apache.org/jira/browse/YARN-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Templeton updated YARN-6019:
-----------------------------------
Assignee: Aleksandr Balitsky
> MR application fails with "No NMToken sent" exception after MRAppMaster
> recovery
> --------------------------------------------------------------------------------
>
> Key: YARN-6019
> URL: https://issues.apache.org/jira/browse/YARN-6019
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager, yarn
> Affects Versions: 2.7.0
> Environment: Centos 7
> Reporter: Aleksandr Balitsky
> Assignee: Aleksandr Balitsky
> Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9]
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container
> launch failed for container_1482408247195_0002_02_000011 :
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent
> for node1:43037
> at
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
> at
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:244)
> at
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
> at
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
> at
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
> at
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted
> to RMCommunicator in RegisterApplicationMasterResponse
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in
> RMCommunicator.register method. RM don't transmit tese tokens again for other
> allocated requests, but we don't have these tokens in NMTokenCache.
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>
> I tried to do the same scenario without the commit and application completed
> successfully after RMAppMaster recovery
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]