[ 
https://issues.apache.org/jira/browse/YARN-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645180#comment-16645180
 ] 

Jason Lowe commented on YARN-8865:
----------------------------------

Thanks for the report and patch!  Do we have any idea how these are getting 
leaked in the first place?  If I recall correctly, there's a thread pool that 
periodically tries to renew tokens, and when those tokens fail to renew because 
they're expired the token is removed from the state store.  Therefore even upon 
recovery it should try to renew these ancient tokens, fail to do so because 
they're expired, then remove them from the state store.  Is the state store 
removal itself failing?  Each secret manager is responsible for removing 
expired tokens it is managing, so wondering how that is not happening here.

Rather than have each state store need to implement this feature separately, 
wondering if the RMDelegationTokenSecretManager should choose not to load the 
tokens in the recovered RMDTSecretManagerState that are expired and instead 
immediately remove them from the state store.  Otherwise every state store 
needs to implement this separately which is a maintenance burden.


> RMStateStore contains large number of expired RMDelegationToken
> ---------------------------------------------------------------
>
>                 Key: YARN-8865
>                 URL: https://issues.apache.org/jira/browse/YARN-8865
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>         Attachments: YARN-8865.001.patch
>
>
> When the RM state store is restored expired delegation tokens are restored 
> and added to the system. These expired tokens do not get cleaned up or 
> removed. The exact reason why the tokens are still in the store is not clear. 
> We have seen as many as 250,000 tokens in the store some of which were 2 
> years old.
> This has two side effects:
> * for the zookeeper store this leads to a jute buffer exhaustion issue and 
> prevents the RM from becoming active.
> * restore takes longer than needed and heap usage is higher than it should be
> We should not restore already expired tokens since they cannot be renewed or 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to