[
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803703#comment-13803703
]
Vinod Kumar Vavilapalli commented on YARN-1321:
-----------------------------------------------
I think we should try to keep both the client libraries decoupled from each
other. These are the cases:
- Common use case - using both libraries: User's AM doesn't do anything today.
- Less likely
-- Using AMRMClient only: User's AM access the static token-cache to
extract tokens and use them for talking to NMs.
-- Using NMClient only: User's AM gets tokens from the protocol response
and puts them into the static cache.
Looks like whatever we do, we'll break compat. So, as much as I hate to do it,
we can add a config yarn.client.static-nmtokens-cache that is set to true by
default. If it is explicitly set to false (by llama), then AMRMClient and
NMClient can mandate setting the token-cache via setNMTokenCache() APIs.
- That way, when used as a singleton, all the above use-cases would just work
as they are now.
- Otherwise, when the flag is unset, we go to a mode where all AMs should
create Clients explicitly passing a NMTokenCache for use. Otherwise, client
creation will fail.
Would that work? What do others think? Am I still sane?
> NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM
> to work correctly
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-1321
> URL: https://issues.apache.org/jira/browse/YARN-1321
> Project: Hadoop YARN
> Issue Type: Bug
> Components: client
> Affects Versions: 2.2.0
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Priority: Blocker
> Attachments: YARN-1321.patch, YARN-1321.patch, YARN-1321.patch,
> YARN-1321.patch
>
>
> NMTokenCache is a singleton. Because of this, if running multiple AMs in a
> single JVM NMTokens for the same node from different AMs step on each other
> and starting containers fail due to mismatch tokens.
> The error observed in the client side is something like:
> {code}
> ERROR org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE)
> cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request
> to start container.
> NMToken for application attempt : appattempt_1382038445650_0002_000001 was
> used for starting container with container token issued for application
> attempt : appattempt_1382038445650_0001_000001
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)