[ 
https://issues.apache.org/jira/browse/YARN-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645820#comment-16645820
 ] 

Wilfred Spiegelenburg commented on YARN-8865:
---------------------------------------------

Thank you for the feedback [~jlowe] and [~daryn]. We're not sure why the tokens 
are not purged in the first place. We suspect that it has something to do with 
the token expiring while the RM is down.

This is a dump of one of the tokens (just the relevant data):
{code}
**** sequence   6327582
**** kind       RM_DELEGATION_TOKEN
**** issuedate  2016-10-09 00:03:17,714+1100
**** maxdate    2016-10-16 00:03:17,714+1100
**** masterkey  106
{code}

I checked the master keys that are in the store and we only have key IDs 877 to 
887. Based on Jason's comment I now understand why we are not removing them 
after the recover. In the recover steps we first recover the master keys. Then 
when that is done we recover the tokens. If the token has a master key that is 
currently not known we just skip the token and "forget" about it. This happens 
in the {{addPersistedDelegationToken()}}. We thus never clean them up if the 
master key is already gone.
That could also explain the issue:
* the expired token removal thread runs and cleans up tokens (problem tokens 
are not expired yet)
* the removal thread runs again and the master key is expired and gets removed: 
tokens also expire at that point but the cleanup has not happened
* the RM goes down or fails over before the token removal thread cleans up the 
expired tokens

Recovery from that point on will ignore the tokens because the key has been 
removed. This should thus affect all state store types and not just the ZK, 
Daryn am I correct in saying that?
Logically the change should then be in the ADTSM method 
{{addPersistedDelegationToken()}} to remove the token when no key is found.

> RMStateStore contains large number of expired RMDelegationToken
> ---------------------------------------------------------------
>
>                 Key: YARN-8865
>                 URL: https://issues.apache.org/jira/browse/YARN-8865
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>         Attachments: YARN-8865.001.patch
>
>
> When the RM state store is restored expired delegation tokens are restored 
> and added to the system. These expired tokens do not get cleaned up or 
> removed. The exact reason why the tokens are still in the store is not clear. 
> We have seen as many as 250,000 tokens in the store some of which were 2 
> years old.
> This has two side effects:
> * for the zookeeper store this leads to a jute buffer exhaustion issue and 
> prevents the RM from becoming active.
> * restore takes longer than needed and heap usage is higher than it should be
> We should not restore already expired tokens since they cannot be renewed or 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to