Jim Brennan created YARN-10348:
----------------------------------

             Summary: Allow RM to always cancel tokens after app completes
                 Key: YARN-10348
                 URL: https://issues.apache.org/jira/browse/YARN-10348
             Project: Hadoop YARN
          Issue Type: Bug
          Components: yarn
    Affects Versions: 3.1.3, 2.10.0
            Reporter: Jim Brennan
            Assignee: Jim Brennan


(Note: this change was originally done on our internal branch by [~daryn]).

The RM currently has an option for a client to specify disabling token 
cancellation when a job completes. This feature was an initial attempt to 
address the use case of a job launching sub-jobs (ie. oozie launcher) and the 
original job finishing prior to the sub-job(s) completion - ex. original job 
completion triggered premature cancellation of tokens needed by the sub-jobs.

Many years ago, [~daryn] added a more robust implementation to ref count tokens 
([YARN-3055]). This prevented premature cancellation of the token until all 
apps using the token complete, and invalidated the need for a client to specify 
cancel=false. Unfortunately the config option was not removed.

We have seen cases where oozie "java actions" and some users were explicitly 
disabling token cancellation. This can lead to a buildup of defunct tokens that 
may overwhelm the ZK buffer used by the KDC's backing store. At which point the 
KMS fails to connect to ZK and is unable to issue/validate new tokens - 
rendering the KDC only able to authenticate pre-existing tokens. Production 
incidents have occurred due to the buffer size issue.

To avoid these issues, the RM should have the option to ignore/override the 
client's request to not cancel tokens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to