[
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799963#comment-13799963
]
Vinod Kumar Vavilapalli commented on YARN-1321:
-----------------------------------------------
bq. Llama is a single JVM hosting multiple unmanaged ApplicationMasters that
run at the same time (in parallel). Because NMTokenCache is a singleton
NMTokens for the same node from the different AMs step on each other.
Okay, that explains the context.
bq. So far this is the only issue we've run while using multiple AMs in a
single JVM.
That is good to know. You should add some kind of simple test so that so that
this assumption isn't broken in the future.
bq. This seems like that after this patch goes in, all applications will need
to change to work correctly with the client libraries?
Sigh, that is true. Changing from static to non-static breaks apps. We can do
one of the two things:
- Keep the statics around for single AM per JVM case - which I believe will
cover 99% cases and add new non-static APIs or
- Doing something that Omkar is suggesting - add optional APIs to track
NMTokens per appattempt.
Irrespective of the solution, I think we should skip the MR and dist-shell
changes altogether - atleast to prove that the changes are compatible. We can
may be fix them in a follow up ticket.
> NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM
> to work correctly
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-1321
> URL: https://issues.apache.org/jira/browse/YARN-1321
> Project: Hadoop YARN
> Issue Type: Bug
> Components: client
> Affects Versions: 2.2.0
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Priority: Blocker
> Fix For: 2.2.1
>
> Attachments: YARN-1321.patch
>
>
> NMTokenCache is a singleton. Because of this, if running multiple AMs in a
> single JVM NMTokens for the same node from different AMs step on each other
> and starting containers fail due to mismatch tokens.
> The error observed in the client side is something like:
> {code}
> ERROR org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE)
> cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request
> to start container.
> NMToken for application attempt : appattempt_1382038445650_0002_000001 was
> used for starting container with container token issued for application
> attempt : appattempt_1382038445650_0001_000001
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)