[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310570#comment-14310570
 ] 

Harsh J commented on YARN-3021:
-------------------------------

[~vinodkv],

Many thanks for the response here!

bq. Though the patch unblocks the jobs in the short term, it seems like long 
term this is still bad.

I agree in that it does not resolve the problem. The goal we're seeking is also 
short-term, in that of bringing back a behaviour that got allowed on MR1, in 
MR2 - even though both end up facing the same issue.

The longer term approach sounds like the most optimal thing to do for proper 
resolution, but given some users are getting blocked by this behaviour change 
I'd like to know if there'll be any objections in adding the current approach 
as an interim-fix (the doc for the property does/will claim it disables several 
necessary features of the job), and file subsequent JIRAs for implementing the 
standalone renewer?

bq. Irrespective of how we decide to skip tokens, the way the patch is skipping 
renewal will not work. In secure mode, DelegationTokenRenewer drives the app 
state machine. So if you skip adding the app itself to DTR, the app will be 
completely stuck.

In our simple tests the app did run through successfully with such an approach, 
but there was multiple factors we did not test for (app recovery, task 
failures, etc. which could be impacted). Would it be better if we added in a 
morphed DelegationTokenRenewer (which does NOP as part of actual renewal 
logic), instead of skipping adding in the renewer completely?

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-3021
>                 URL: https://issues.apache.org/jira/browse/YARN-3021
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 2.3.0
>            Reporter: Harsh J
>         Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to