[jira] [Assigned] (MAPREDUCE-6848) MRApps.setMRFrameworkClasspath() unnecessarily declares that it throws IOException

2017-03-01 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-6848:
-

Assignee: (was: Haibo Chen)

> MRApps.setMRFrameworkClasspath() unnecessarily declares that it throws 
> IOException
> --
>
> Key: MAPREDUCE-6848
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6848
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Priority: Trivial
>  Labels: newbie
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6853) NMTokenSecretManagerInRM.createAndGetNMToken is not thread safe

2017-03-01 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6853:
-

 Summary: NMTokenSecretManagerInRM.createAndGetNMToken is not 
thread safe
 Key: MAPREDUCE-6853
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6853
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.0-alpha2
Reporter: Haibo Chen
Assignee: Haibo Chen


NMTokenSecretManagerInRM.createAndGetNMToken modifies values of a 
ConcurrentHashMap, which are of type HashTable, but it only acquires read lock.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2017-03-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891327#comment-15891327
 ] 

Haibo Chen edited comment on MAPREDUCE-6834 at 3/1/17 11:55 PM:


Thanks for the clarification, [~jlowe]. We have not made changes to preserve 
containers in MR. Chasing the code in more details, I came to a similar 
conclusion as 
https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003
   MR relies on YARN RM to get the NMtokens needed to launch containers with 
NMs. Given the code today, it is possible that a null NMToken is sent to MR, 
which contracts with the javadoc in SchedulerApplicationAttempt.java here
{code:java}
  // Create container token and NMToken altogether, if either of them fails for
  // some reason like DNS unavailable, do not return this container and keep it
  // in the newlyAllocatedContainers waiting to be refetched.
  public synchronized ContainersAndNMTokensAllocation {...}
{code}
This could be a duplicate of YARN-3112.


was (Author: haibochen):
Thanks for the clarification, [~jlowe]. We have not made changes to preserve 
containers in MR. Chasing the code in more details, I came to a similar 
conclusion as 
https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003
   MR relies on YARN RM to get the NMtokens needed to launch containers with 
NMs. Given the code today, it is possible that a null NMToken is sent to MR, 
which contracts with the javadoc in SchedulerApplicationAttempt.java here
{code:java}
  // Create container token and NMToken altogether, if either of them fails for
  // some reason like DNS unavailable, do not return this container and keep it
  // in the newlyAllocatedContainers waiting to be refetched.
  public synchronized ContainersAndNMTokensAllocation {...}
{code}
I believe this is a duplicate of YARN-3112.

> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: MAPREDUCE-6834
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Assignee: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't 

[jira] [Comment Edited] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2017-03-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891327#comment-15891327
 ] 

Haibo Chen edited comment on MAPREDUCE-6834 at 3/1/17 11:54 PM:


Thanks for the clarification, [~jlowe]. We have not made changes to preserve 
containers in MR. Chasing the code in more details, I came to a similar 
conclusion as 
https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003
   MR relies on YARN RM to get the NMtokens needed to launch containers with 
NMs. Given the code today, it is possible that a null NMToken is sent to MR, 
which contracts with the javadoc in SchedulerApplicationAttempt.java here
{code:java}
  // Create container token and NMToken altogether, if either of them fails for
  // some reason like DNS unavailable, do not return this container and keep it
  // in the newlyAllocatedContainers waiting to be refetched.
  public synchronized ContainersAndNMTokensAllocation {...}
{code}
I believe this is a duplicate of YARN-3112.


was (Author: haibochen):
Thanks for the clarification, [~jlowe]. We have not made changes to preserve 
containers in MR. Chasing the code in more details, I came to a similar 
conclusion as 
https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003
   MR relies on YARN RM to get the NMtokens needed to launch containers with 
NMs. Given the code today, it is possible that a null NMToken is sent to MR, 
which contracts with the javadoc in SchedulerApplicationAttempt.java here
{code:java}
  // Create container token and NMToken altogether, if either of them fails for
  // some reason like DNS unavailable, do not return this container and keep it
  // in the newlyAllocatedContainers waiting to be refetched.
  public synchronized ContainersAndNMTokensAllocation {...}
{code}
I believe this is a duplicate of YARN-3112, so I am going to close this jira as 
a duplicate. Feel free to reopen it if you disagree.


> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: MAPREDUCE-6834
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Assignee: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> 

[jira] [Comment Edited] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2017-03-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891327#comment-15891327
 ] 

Haibo Chen edited comment on MAPREDUCE-6834 at 3/1/17 11:52 PM:


Thanks for the clarification, [~jlowe]. We have not made changes to preserve 
containers in MR. Chasing the code in more details, I came to a similar 
conclusion as 
https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003
   MR relies on YARN RM to get the NMtokens needed to launch containers with 
NMs. Given the code today, it is possible that a null NMToken is sent to MR, 
which contracts with the javadoc in SchedulerApplicationAttempt.java here
{code:java}
  // Create container token and NMToken altogether, if either of them fails for
  // some reason like DNS unavailable, do not return this container and keep it
  // in the newlyAllocatedContainers waiting to be refetched.
  public synchronized ContainersAndNMTokensAllocation {...}
{code}
I believe this is a duplicate of YARN-3112, so I am going to close this jira as 
a duplicate. Feel free to reopen it if you disagree.



was (Author: haibochen):
Thanks for the clarification, [~jlowe]. We have not made changes to preserve 
containers in MR. Chasing the code in more details, I came to a similar 
conclusion as 
https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003
   MR relies on YARN RM to get the NMtokens needed to launch containers with 
NMs. Given the code today, it is possible that a null NMToken is sent to MR, 
which contracts with the javadoc here
bq.
  // Create container token and NMToken altogether, if either of them fails for
  // some reason like DNS unavailable, do not return this container and keep it
  // in the newlyAllocatedContainers waiting to be refetched.
  public synchronized ContainersAndNMTokensAllocation {...}

I believe this is a duplicate of YARN-3112, so I am going to close this jira as 
a duplicate. Feel free to reopen it if you disagree.


> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: MAPREDUCE-6834
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Assignee: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't 

[jira] [Commented] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2017-03-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891327#comment-15891327
 ] 

Haibo Chen commented on MAPREDUCE-6834:
---

Thanks for the clarification, [~jlowe]. We have not made changes to preserve 
containers in MR. Chasing the code in more details, I came to a similar 
conclusion as 
https://issues.apache.org/jira/browse/YARN-3112?focusedCommentId=14299003=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299003
   MR relies on YARN RM to get the NMtokens needed to launch containers with 
NMs. Given the code today, it is possible that a null NMToken is sent to MR, 
which contracts with the javadoc here
bq.
  // Create container token and NMToken altogether, if either of them fails for
  // some reason like DNS unavailable, do not return this container and keep it
  // in the newlyAllocatedContainers waiting to be refetched.
  public synchronized ContainersAndNMTokensAllocation {...}

I believe this is a duplicate of YARN-3112, so I am going to close this jira as 
a duplicate. Feel free to reopen it if you disagree.


> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: MAPREDUCE-6834
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Assignee: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't have these tokens in NMTokenCache. 
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the 
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>  
> I tried to do the same scenario without the commit and application completed 
> successfully after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6850) Shuffle Handler keep-alive connections are closed from the server side

2017-03-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890346#comment-15890346
 ] 

Jason Lowe commented on MAPREDUCE-6850:
---

Thanks for the patch!

I'm wondering about handling the idle timeout.  I'm worried about the case 
where disks are busy or slow.  If it takes us longer than the idle timeout to 
respond to a request then I think the idle handler will close the connection 
mid-request which doesn't seem appropriate.  Unless I'm missing something, the 
idle handler needs to check if we're in the middle of servicing a request or if 
we're waiting for the next request and only close for the latter.

> Shuffle Handler keep-alive connections are closed from the server side
> --
>
> Key: MAPREDUCE-6850
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6850
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: MAPREDUCE-6850.1.patch, MAPREDUCE-6850.2.patch, 
> MAPREDUCE-6850.3.patch, MAPREDUCE-6850.4.patch, With_Issue.png, 
> With_Patch.png, With_Patch_withData.png
>
>
> When performance testing tez shuffle handler (TEZ-3334), it was noticed the 
> keep-alive connections are closed from the server-side. The client silently 
> recovers and logs the connection as keep-alive, despite reestablishing a 
> connection. This jira aims to remove the close from the server side, fixing 
> the bug preventing keep-alive connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery

2017-03-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890310#comment-15890310
 ] 

Jason Lowe commented on MAPREDUCE-6834:
---

Yes, which is why I've been asking all along if this is a case where MapReduce 
has been modified to preserve containers across AM attempts.  If that is indeed 
the case then this is essentially a bug report against an internal patch that 
is not part of Apache Hadoop.

> MR application fails with "No NMToken sent" exception after MRAppMaster 
> recovery
> 
>
> Key: MAPREDUCE-6834
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 2.7.0
> Environment: Centos 7
>Reporter: Aleksandr Balitsky
>Assignee: Aleksandr Balitsky
>Priority: Critical
> Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt 
> and application finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application 
> fails with the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
> launch failed for container_1482408247195_0002_02_11 : 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
> for node1:43037
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:244)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM 
> generates NMTokens for new RMAppAttempt. Those new NMTokens are transmitted 
> to RMCommunicator in RegisterApplicationMasterResponse  
> (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in 
> RMCommunicator.register method. RM don't transmit tese tokens again for other 
> allocated requests, but we don't have these tokens in NMTokenCache. 
> Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the 
> https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed
>  
> I tried to do the same scenario without the commit and application completed 
> successfully after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org