[ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799747#comment-13799747
 ] 

Alejandro Abdelnur commented on YARN-1321:
------------------------------------------

We run into this issue in Llama. Llama is a single JVM hosting multiple 
unmanaged ApplicationMasters that run at the same time (in parallel). Because 
NMTokenCache is a singleton NMTokens for the same node from the different AMs 
step on each other.

The patch that I'm working preserves the current behavior (singleton 
NMTokenCache) while allowing a client to set a NMTokenCache instance to the 
AMRMClient/NMClient (and Async versions). If an instance is set, then the 
NMTokens are stored in it instead of the singleton. This preserves backward 
compatibility both in behavior and in API.




> NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM 
> to work correctly.
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-1321
>                 URL: https://issues.apache.org/jira/browse/YARN-1321
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.2.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Blocker
>             Fix For: 2.2.1
>
>
> NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
> single JVM NMTokens for the same node from different AMs step on each other 
> and starting containers fail due to mismatch tokens.
> The error observed in the client side is something like:
> {code}
> ERROR org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
> cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
> to start container. 
> NMToken for application attempt : appattempt_1382038445650_0002_000001 was 
> used for starting container with container token issued for application 
> attempt : appattempt_1382038445650_0001_000001
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to