[ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151817#comment-16151817
 ] 

Manikandan R commented on YARN-6523:
------------------------------------

[~Naganarasimha]

I made an attempt to understand the discussion in this JIRA and come up with 
below steps to address this. Please review and share your thoughts. Can I work 
on this?

A) Sequence No flow:

1. Introduce a atomic long variable in RMContextImpl to hold this sequence no
2. Make sure the above sequence no current value passed to Nodes as part of 
node registration process through response
3. Increment above sequence no as and when there is any update in delegation 
tokens, specifically in 
DelegationTokenRenewer#requestNewHdfsDelegationTokenAsProxyUser
4. ResourceTrackerService#nodeHeartbeat would use the above sequence number to 
decide whether to update SystemCredentialsForApps in NodeHeartbeatResponse or 
not by comparing it with the number received as part of NodeHeartbeatRequest.
5. NodeHeartbeatRequest will start having sequence no as part of the request. 
This requires a change in corresponding proto class.
6. NodeHeartbeatResponse will start having sequence no as part of the  
response. This requires a change in corresponding proto class.

B) Caching system credentials objects

Will go through the code and share my proposal to achieve the same.

> RM requires large memory in sending out security tokens as part of Node 
> Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to