[
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982284#comment-15982284
]
Naganarasimha G R commented on YARN-6523:
-----------------------------------------
Approach depends on why we are sending credentials for all apps which i am not
completely clear. IMO it should be sufficient to send the tokens for the apps
(containers) active on the node.
Possible solutions :
# Send only app credentials related to the node on each heartbeat
# Send only app credentials related to the node on each heartbeat and also
delta modifications for the node since the last heartbeat.
# Cache SystemCredentialsForAppsProto objects itself and reuse them rather than
recreating for each node's heartbeat.(if require to send all the apps token to
the node)
P.S. credit goes to [~gu chi] for analysis of this issue.
> RM requires large memory in sending out security tokens as part of Node
> Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
> Key: YARN-6523
> URL: https://issues.apache.org/jira/browse/YARN-6523
> Project: Hadoop YARN
> Issue Type: Bug
> Components: RM
> Affects Versions: 2.8.0, 2.7.3
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens
> though all applications might not be active on the node. On top of it
> NodeHeartbeatResponsePBImpl converts tokens for each app into
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with
> 8GB RAM configured for RM
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]