[
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151817#comment-16151817
]
Manikandan R commented on YARN-6523:
------------------------------------
[~Naganarasimha]
I made an attempt to understand the discussion in this JIRA and come up with
below steps to address this. Please review and share your thoughts. Can I work
on this?
A) Sequence No flow:
1. Introduce a atomic long variable in RMContextImpl to hold this sequence no
2. Make sure the above sequence no current value passed to Nodes as part of
node registration process through response
3. Increment above sequence no as and when there is any update in delegation
tokens, specifically in
DelegationTokenRenewer#requestNewHdfsDelegationTokenAsProxyUser
4. ResourceTrackerService#nodeHeartbeat would use the above sequence number to
decide whether to update SystemCredentialsForApps in NodeHeartbeatResponse or
not by comparing it with the number received as part of NodeHeartbeatRequest.
5. NodeHeartbeatRequest will start having sequence no as part of the request.
This requires a change in corresponding proto class.
6. NodeHeartbeatResponse will start having sequence no as part of the
response. This requires a change in corresponding proto class.
B) Caching system credentials objects
Will go through the code and share my proposal to achieve the same.
> RM requires large memory in sending out security tokens as part of Node
> Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
> Key: YARN-6523
> URL: https://issues.apache.org/jira/browse/YARN-6523
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: RM
> Affects Versions: 2.8.0, 2.7.3
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens
> though all applications might not be active on the node. On top of it
> NodeHeartbeatResponsePBImpl converts tokens for each app into
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with
> 8GB RAM configured for RM
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]