Jason Lowe commented on YARN-4041:

The active apps already have the tokens and are running on the cluster, so I'm 
not sure why it's so pressing that we synchronously process token renewal upon 
recovery.  This should be made asynchronous, or even better, we shouldn't do 
any renewals just because we restarted.  Ideally the RM should be tracking when 
tokens need to be renewed and renew them at that point.  If we restart and some 
tokens are due for a renewal then we should go ahead and renew those, but I 
don't think the RM should blindly renew all tokens for apps that are already 
active and running on the cluster when it restarts.

> Slow delegation token renewal can severely prolong RM recovery
> --------------------------------------------------------------
>                 Key: YARN-4041
>                 URL: https://issues.apache.org/jira/browse/YARN-4041
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.

This message was sent by Atlassian JIRA

Reply via email to