[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2964:
--------------------------
    Attachment: YARN-2964.1.patch

uploaded a patch:
- the patch adds a new map which keeps track of all the tokens. If the token is 
already present, it'll not add a new DelegationTokenToRenew instance for that 
token.
- add a conditional check in requestNewHdfsDelegationToken method (missed this 
in YARN-2704)

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---------------------------------------------------------------
>
>                 Key: YARN-2964
>                 URL: https://issues.apache.org/jira/browse/YARN-2964
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Daryn Sharp
>            Assignee: Jian He
>            Priority: Blocker
>         Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to