[jira] [Assigned] (MAPREDUCE-6848) MRApps.setMRFrameworkClasspath() unnecessarily declares that it throws IOException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-6848: - Assignee: (was: Haibo Chen) > MRApps.setMRFrameworkClasspath() unnecessarily declares that it throws > IOException > -- > > Key: MAPREDUCE-6848 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6848 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Priority: Trivial > Labels: newbie > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6853) NMTokenSecretManagerInRM.createAndGetNMToken is not thread safe
Haibo Chen created MAPREDUCE-6853: - Summary: NMTokenSecretManagerInRM.createAndGetNMToken is not thread safe Key: MAPREDUCE-6853 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6853 Project: Hadoop Map/Reduce Issue Type: Bug Components: yarn Affects Versions: 3.0.0-alpha2 Reporter: Haibo Chen Assignee: Haibo Chen NMTokenSecretManagerInRM.createAndGetNMToken modifies values of a ConcurrentHashMap, which are of type HashTable, but it only acquires read lock. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891327#comment-15891327 ] Haibo Chen edited comment on MAPREDUCE-6834 at 3/1/17 11:55 PM: Thanks for the clarification, [~jlowe]. We have not made changes to preserve containers in MR. Chasing the code in more details, I came to a similar conclusion as https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003 MR relies on YARN RM to get the NMtokens needed to launch containers with NMs. Given the code today, it is possible that a null NMToken is sent to MR, which contracts with the javadoc in SchedulerApplicationAttempt.java here {code:java} // Create container token and NMToken altogether, if either of them fails for // some reason like DNS unavailable, do not return this container and keep it // in the newlyAllocatedContainers waiting to be refetched. public synchronized ContainersAndNMTokensAllocation {...} {code} This could be a duplicate of YARN-3112. was (Author: haibochen): Thanks for the clarification, [~jlowe]. We have not made changes to preserve containers in MR. Chasing the code in more details, I came to a similar conclusion as https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003 MR relies on YARN RM to get the NMtokens needed to launch containers with NMs. Given the code today, it is possible that a null NMToken is sent to MR, which contracts with the javadoc in SchedulerApplicationAttempt.java here {code:java} // Create container token and NMToken altogether, if either of them fails for // some reason like DNS unavailable, do not return this container and keep it // in the newlyAllocatedContainers waiting to be refetched. public synchronized ContainersAndNMTokensAllocation {...} {code} I believe this is a duplicate of YARN-3112. > MR application fails with "No NMToken sent" exception after MRAppMaster > recovery > > > Key: MAPREDUCE-6834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager, yarn >Affects Versions: 2.7.0 > Environment: Centos 7 >Reporter: Aleksandr Balitsky >Assignee: Aleksandr Balitsky >Priority: Critical > Attachments: YARN-6019.001.patch > > > *Steps to reproduce:* > 1) Submit MR application (for example PI app with 50 containers) > 2) Find MRAppMaster process id for the application > 3) Kill MRAppMaster by kill -9 command > *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt > and application finish correctly > *Actually:* After launching new MRAppMaster and MRAppAttempt the application > fails with the following exception: > {noformat} > 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > launch failed for container_1482408247195_0002_02_11 : > org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent > for node1:43037 > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > *Problem*: > When RMCommunicator sends "registerApplicationMaster" request to RM, RM > generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted > to RMCommunicator in RegisterApplicationMasterResponse > (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in > RMCommunicator.register method. RM don't transmit tese tokens again for other > allocated requests, but we don't
[jira] [Comment Edited] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891327#comment-15891327 ] Haibo Chen edited comment on MAPREDUCE-6834 at 3/1/17 11:54 PM: Thanks for the clarification, [~jlowe]. We have not made changes to preserve containers in MR. Chasing the code in more details, I came to a similar conclusion as https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003 MR relies on YARN RM to get the NMtokens needed to launch containers with NMs. Given the code today, it is possible that a null NMToken is sent to MR, which contracts with the javadoc in SchedulerApplicationAttempt.java here {code:java} // Create container token and NMToken altogether, if either of them fails for // some reason like DNS unavailable, do not return this container and keep it // in the newlyAllocatedContainers waiting to be refetched. public synchronized ContainersAndNMTokensAllocation {...} {code} I believe this is a duplicate of YARN-3112. was (Author: haibochen): Thanks for the clarification, [~jlowe]. We have not made changes to preserve containers in MR. Chasing the code in more details, I came to a similar conclusion as https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003 MR relies on YARN RM to get the NMtokens needed to launch containers with NMs. Given the code today, it is possible that a null NMToken is sent to MR, which contracts with the javadoc in SchedulerApplicationAttempt.java here {code:java} // Create container token and NMToken altogether, if either of them fails for // some reason like DNS unavailable, do not return this container and keep it // in the newlyAllocatedContainers waiting to be refetched. public synchronized ContainersAndNMTokensAllocation {...} {code} I believe this is a duplicate of YARN-3112, so I am going to close this jira as a duplicate. Feel free to reopen it if you disagree. > MR application fails with "No NMToken sent" exception after MRAppMaster > recovery > > > Key: MAPREDUCE-6834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager, yarn >Affects Versions: 2.7.0 > Environment: Centos 7 >Reporter: Aleksandr Balitsky >Assignee: Aleksandr Balitsky >Priority: Critical > Attachments: YARN-6019.001.patch > > > *Steps to reproduce:* > 1) Submit MR application (for example PI app with 50 containers) > 2) Find MRAppMaster process id for the application > 3) Kill MRAppMaster by kill -9 command > *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt > and application finish correctly > *Actually:* After launching new MRAppMaster and MRAppAttempt the application > fails with the following exception: > {noformat} > 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > launch failed for container_1482408247195_0002_02_11 : > org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent > for node1:43037 > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > *Problem*: > When RMCommunicator sends "registerApplicationMaster" request to RM, RM > generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted > to RMCommunicator in RegisterApplicationMasterResponse > (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in >
[jira] [Comment Edited] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891327#comment-15891327 ] Haibo Chen edited comment on MAPREDUCE-6834 at 3/1/17 11:52 PM: Thanks for the clarification, [~jlowe]. We have not made changes to preserve containers in MR. Chasing the code in more details, I came to a similar conclusion as https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003 MR relies on YARN RM to get the NMtokens needed to launch containers with NMs. Given the code today, it is possible that a null NMToken is sent to MR, which contracts with the javadoc in SchedulerApplicationAttempt.java here {code:java} // Create container token and NMToken altogether, if either of them fails for // some reason like DNS unavailable, do not return this container and keep it // in the newlyAllocatedContainers waiting to be refetched. public synchronized ContainersAndNMTokensAllocation {...} {code} I believe this is a duplicate of YARN-3112, so I am going to close this jira as a duplicate. Feel free to reopen it if you disagree. was (Author: haibochen): Thanks for the clarification, [~jlowe]. We have not made changes to preserve containers in MR. Chasing the code in more details, I came to a similar conclusion as https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003 MR relies on YARN RM to get the NMtokens needed to launch containers with NMs. Given the code today, it is possible that a null NMToken is sent to MR, which contracts with the javadoc here bq. // Create container token and NMToken altogether, if either of them fails for // some reason like DNS unavailable, do not return this container and keep it // in the newlyAllocatedContainers waiting to be refetched. public synchronized ContainersAndNMTokensAllocation {...} I believe this is a duplicate of YARN-3112, so I am going to close this jira as a duplicate. Feel free to reopen it if you disagree. > MR application fails with "No NMToken sent" exception after MRAppMaster > recovery > > > Key: MAPREDUCE-6834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager, yarn >Affects Versions: 2.7.0 > Environment: Centos 7 >Reporter: Aleksandr Balitsky >Assignee: Aleksandr Balitsky >Priority: Critical > Attachments: YARN-6019.001.patch > > > *Steps to reproduce:* > 1) Submit MR application (for example PI app with 50 containers) > 2) Find MRAppMaster process id for the application > 3) Kill MRAppMaster by kill -9 command > *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt > and application finish correctly > *Actually:* After launching new MRAppMaster and MRAppAttempt the application > fails with the following exception: > {noformat} > 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > launch failed for container_1482408247195_0002_02_11 : > org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent > for node1:43037 > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > *Problem*: > When RMCommunicator sends "registerApplicationMaster" request to RM, RM > generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted > to RMCommunicator in RegisterApplicationMasterResponse > (getNMTokensFromPreviousAttempts method). But we don't
[jira] [Commented] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891327#comment-15891327 ] Haibo Chen commented on MAPREDUCE-6834: --- Thanks for the clarification, [~jlowe]. We have not made changes to preserve containers in MR. Chasing the code in more details, I came to a similar conclusion as https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003 MR relies on YARN RM to get the NMtokens needed to launch containers with NMs. Given the code today, it is possible that a null NMToken is sent to MR, which contracts with the javadoc here bq. // Create container token and NMToken altogether, if either of them fails for // some reason like DNS unavailable, do not return this container and keep it // in the newlyAllocatedContainers waiting to be refetched. public synchronized ContainersAndNMTokensAllocation {...} I believe this is a duplicate of YARN-3112, so I am going to close this jira as a duplicate. Feel free to reopen it if you disagree. > MR application fails with "No NMToken sent" exception after MRAppMaster > recovery > > > Key: MAPREDUCE-6834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager, yarn >Affects Versions: 2.7.0 > Environment: Centos 7 >Reporter: Aleksandr Balitsky >Assignee: Aleksandr Balitsky >Priority: Critical > Attachments: YARN-6019.001.patch > > > *Steps to reproduce:* > 1) Submit MR application (for example PI app with 50 containers) > 2) Find MRAppMaster process id for the application > 3) Kill MRAppMaster by kill -9 command > *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt > and application finish correctly > *Actually:* After launching new MRAppMaster and MRAppAttempt the application > fails with the following exception: > {noformat} > 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > launch failed for container_1482408247195_0002_02_11 : > org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent > for node1:43037 > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > *Problem*: > When RMCommunicator sends "registerApplicationMaster" request to RM, RM > generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted > to RMCommunicator in RegisterApplicationMasterResponse > (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in > RMCommunicator.register method. RM don't transmit tese tokens again for other > allocated requests, but we don't have these tokens in NMTokenCache. > Accordingly we get "No NMToken sent for node" exception. > I have found that this issue appears after changes from the > https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed > > I tried to do the same scenario without the commit and application completed > successfully after RMAppMaster recovery -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6850) Shuffle Handler keep-alive connections are closed from the server side
[ https://issues.apache.org/jira/browse/MAPREDUCE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890346#comment-15890346 ] Jason Lowe commented on MAPREDUCE-6850: --- Thanks for the patch! I'm wondering about handling the idle timeout. I'm worried about the case where disks are busy or slow. If it takes us longer than the idle timeout to respond to a request then I think the idle handler will close the connection mid-request which doesn't seem appropriate. Unless I'm missing something, the idle handler needs to check if we're in the middle of servicing a request or if we're waiting for the next request and only close for the latter. > Shuffle Handler keep-alive connections are closed from the server side > -- > > Key: MAPREDUCE-6850 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6850 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: MAPREDUCE-6850.1.patch, MAPREDUCE-6850.2.patch, > MAPREDUCE-6850.3.patch, MAPREDUCE-6850.4.patch, With_Issue.png, > With_Patch.png, With_Patch_withData.png > > > When performance testing tez shuffle handler (TEZ-3334), it was noticed the > keep-alive connections are closed from the server-side. The client silently > recovers and logs the connection as keep-alive, despite reestablishing a > connection. This jira aims to remove the close from the server side, fixing > the bug preventing keep-alive connections. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890310#comment-15890310 ] Jason Lowe commented on MAPREDUCE-6834: --- Yes, which is why I've been asking all along if this is a case where MapReduce has been modified to preserve containers across AM attempts. If that is indeed the case then this is essentially a bug report against an internal patch that is not part of Apache Hadoop. > MR application fails with "No NMToken sent" exception after MRAppMaster > recovery > > > Key: MAPREDUCE-6834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager, yarn >Affects Versions: 2.7.0 > Environment: Centos 7 >Reporter: Aleksandr Balitsky >Assignee: Aleksandr Balitsky >Priority: Critical > Attachments: YARN-6019.001.patch > > > *Steps to reproduce:* > 1) Submit MR application (for example PI app with 50 containers) > 2) Find MRAppMaster process id for the application > 3) Kill MRAppMaster by kill -9 command > *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt > and application finish correctly > *Actually:* After launching new MRAppMaster and MRAppAttempt the application > fails with the following exception: > {noformat} > 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > launch failed for container_1482408247195_0002_02_11 : > org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent > for node1:43037 > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > *Problem*: > When RMCommunicator sends "registerApplicationMaster" request to RM, RM > generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted > to RMCommunicator in RegisterApplicationMasterResponse > (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in > RMCommunicator.register method. RM don't transmit tese tokens again for other > allocated requests, but we don't have these tokens in NMTokenCache. > Accordingly we get "No NMToken sent for node" exception. > I have found that this issue appears after changes from the > https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed > > I tried to do the same scenario without the commit and application completed > successfully after RMAppMaster recovery -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org