[ 
https://issues.apache.org/jira/browse/YARN-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646622#comment-16646622
 ] 

Wilfred Spiegelenburg commented on YARN-8865:
---------------------------------------------

I am not sure what has happened in the environment or even if the two cleanup 
times were set differently (key and token have their own interval). I just have 
the zookeeper DB to work with no logs from that time frame.

The ADTSM method {{addPersistedDelegationToken}} has a safe guard already: the 
secret manager cannot be running at the time we restore. That removes a lot of 
the problem. The other side (specifically for the NN) HDFS uses its own version 
of {{addPersistedDelegationToken}}. It has its own implementation in 
DelegationTokenSecretManager (defined in 
org.apache.hadoop.hdfs.security.token.delegation). The HDFS side should thus 
not be affected by the change.
The other three uses are YARN RM, YARN ATS and MR JHS. Based on what I can see 
none of them have an issue.

If the change is still considered too risky I think the option to still add 
them with a null password is the best solution.


> RMStateStore contains large number of expired RMDelegationToken
> ---------------------------------------------------------------
>
>                 Key: YARN-8865
>                 URL: https://issues.apache.org/jira/browse/YARN-8865
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>         Attachments: YARN-8865.001.patch, YARN-8865.002.patch
>
>
> When the RM state store is restored expired delegation tokens are restored 
> and added to the system. These expired tokens do not get cleaned up or 
> removed. The exact reason why the tokens are still in the store is not clear. 
> We have seen as many as 250,000 tokens in the store some of which were 2 
> years old.
> This has two side effects:
> * for the zookeeper store this leads to a jute buffer exhaustion issue and 
> prevents the RM from becoming active.
> * restore takes longer than needed and heap usage is higher than it should be
> We should not restore already expired tokens since they cannot be renewed or 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to