[
https://issues.apache.org/jira/browse/YARN-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645820#comment-16645820
]
Wilfred Spiegelenburg commented on YARN-8865:
---------------------------------------------
Thank you for the feedback [~jlowe] and [~daryn]. We're not sure why the tokens
are not purged in the first place. We suspect that it has something to do with
the token expiring while the RM is down.
This is a dump of one of the tokens (just the relevant data):
{code}
**** sequence 6327582
**** kind RM_DELEGATION_TOKEN
**** issuedate 2016-10-09 00:03:17,714+1100
**** maxdate 2016-10-16 00:03:17,714+1100
**** masterkey 106
{code}
I checked the master keys that are in the store and we only have key IDs 877 to
887. Based on Jason's comment I now understand why we are not removing them
after the recover. In the recover steps we first recover the master keys. Then
when that is done we recover the tokens. If the token has a master key that is
currently not known we just skip the token and "forget" about it. This happens
in the {{addPersistedDelegationToken()}}. We thus never clean them up if the
master key is already gone.
That could also explain the issue:
* the expired token removal thread runs and cleans up tokens (problem tokens
are not expired yet)
* the removal thread runs again and the master key is expired and gets removed:
tokens also expire at that point but the cleanup has not happened
* the RM goes down or fails over before the token removal thread cleans up the
expired tokens
Recovery from that point on will ignore the tokens because the key has been
removed. This should thus affect all state store types and not just the ZK,
Daryn am I correct in saying that?
Logically the change should then be in the ADTSM method
{{addPersistedDelegationToken()}} to remove the token when no key is found.
> RMStateStore contains large number of expired RMDelegationToken
> ---------------------------------------------------------------
>
> Key: YARN-8865
> URL: https://issues.apache.org/jira/browse/YARN-8865
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.1.0
> Reporter: Wilfred Spiegelenburg
> Assignee: Wilfred Spiegelenburg
> Priority: Major
> Attachments: YARN-8865.001.patch
>
>
> When the RM state store is restored expired delegation tokens are restored
> and added to the system. These expired tokens do not get cleaned up or
> removed. The exact reason why the tokens are still in the store is not clear.
> We have seen as many as 250,000 tokens in the store some of which were 2
> years old.
> This has two side effects:
> * for the zookeeper store this leads to a jute buffer exhaustion issue and
> prevents the RM from becoming active.
> * restore takes longer than needed and heap usage is higher than it should be
> We should not restore already expired tokens since they cannot be renewed or
> used.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]