[
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996976#comment-15996976
]
Jason Lowe commented on YARN-6523:
----------------------------------
bq. well actually i was trying to say here was not a delta, but send the tokens
for all apps for which atleast one of the tokens gets renewed (assuming that
there will be less #apps for which renewal happens).
When I last looked at the code, I thought the system credentials map only had
the tokens that the RM had to go get on behalf of the app? If that's indeed
the case, then the credentials being sent to each node on every heartbeat
already is the subset you are proposing. Looking at the code again, I only see
the system credentials being added by
DelegationTokenRenewer#requestNewHdfsDelegationTokenAsProxyUser, and that is
only called if the HDFS token is missing or bad for the app. In addition it is
only adding the token that was retrieved and not the entire app credentials.
That already seems to be the minimum set of credentials that the RM needs to
send to all nodes if we're not doing a per-node delta approach.
Given this is really bad on your cluster, it'd be good to understand why the RM
has put so many tokens in there since it should only be putting in ones where
the app was either missing the HDFS token or the token couldn't be renewed for
some reason (e.g.: already expired). Maybe I'm missing something in the code.
bq. But having delta per node does not solve the first issue.
It does solve the issue if we're tracking the delta for all nodes in the
cluster, not just nodes that have run the app's containers in the past. I hope
we're all in agreement now that we cannot make this work if we're only sending
the system credentials for an app to a subset of the nodes. An optimal,
minimal data transfer approach is where we only send the changed credentials
for an app since the last time a node heartbeats. That credential delta will
be different for some nodes vs. others since their heartbeats can occur before
or after an app credential update. A bit complicated to implement in practice,
but it is doable.
bq. IIUC there is no version concept as of now between RM and NM
There is a version that is sent between the NM and RM when the NM registers.
That's how the yarn.nodemanager.resourcemanager.minimum.version functionality
works, check ResourceTrackerService#registerNodeManager. It is true that the
version the NM reports is discarded once it passes the minimum version check,
and we'd need to store the NM version (or a feature bit derived from the
version) somewhere like the RMNodeImpl to handle the heartbeat semantic change
properly.
> RM requires large memory in sending out security tokens as part of Node
> Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
> Key: YARN-6523
> URL: https://issues.apache.org/jira/browse/YARN-6523
> Project: Hadoop YARN
> Issue Type: Bug
> Components: RM
> Affects Versions: 2.8.0, 2.7.3
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens
> though all applications might not be active on the node. On top of it
> NodeHeartbeatResponsePBImpl converts tokens for each app into
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with
> 8GB RAM configured for RM
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]