[
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984919#comment-15984919
]
Jason Lowe commented on YARN-6523:
----------------------------------
I don't know the full story behind the SystemCredentialsForApps thing. Looks
like something that was put in for Slider and other long-running services where
the initial tokens can expire. It would be good to get input from [~vinodkv]
and [~jianhe] since they were more involved in this.
I agree it seems silly for every node in the cluster to get _all_ apps HDFS
credentials on _every heartbeat_. I suspect this was the simplest thing to
implement, but it's far from efficient. Going to the other extreme of just
sending the app credentials only once for just the apps that could be active on
the node is a lot more complicated. It's true that RMNodeImpl is tracking what
applications are on the node, but this is _reactive_ tracking to what the node
is already doing. There are some scenarios where the updated tokens need to be
on the node _before_ the container launch request arrives at the node and
therefore the app becomes active in the node's RMNodeImpl. For example, a
Slider app runs for months. The initial tokens at app submit time have long
expired, so the RM has had to re-fetch the tokens. Then suddenly the Slider
app wants to launch a container on a node it's never touched before. The
node's RMNodeImpl doesn't know the app is active until a container starts
running on it, but the container can't localize without the updated tokens that
the node has never received yet. So we'd need to send the credentials when the
scheduler allocates an app's container on the node for the first time and then
also when any of the app's credentials are updated (e.g.: when a token is
replaced with a refreshed version). And then there's handling lost heartbeats,
node reconnect, etc. In short, efficient delta is a lot more complicated.
Rather than going straight to the complicated, fully optimal implementation we
could do something in-between. For example, we could have a sequence number
associated with the system credentials. Nodes would send the last sequence
number that they have received, and if it matches the current sequence number
then the RM does _not_ send them in the heartbeat response. If the sequence
numbers don't match then the RM sends the current sequence number along with
the system credentials. It's still sending all the credentials instead of
optimal deltas, but at least they're only being sent when the node needs the
updated version. And yes, we should precompute the
SystemCredentialsForAppsProto once when the credentials change and re-send the
same object to any node that needs the updated credentials rather than recreate
the same object over and over and over. That should drastically cut down on
the number of objects related to system credentials in heartbeats and how often
we're sending them.
> RM requires large memory in sending out security tokens as part of Node
> Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
> Key: YARN-6523
> URL: https://issues.apache.org/jira/browse/YARN-6523
> Project: Hadoop YARN
> Issue Type: Bug
> Components: RM
> Affects Versions: 2.8.0, 2.7.3
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens
> though all applications might not be active on the node. On top of it
> NodeHeartbeatResponsePBImpl converts tokens for each app into
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with
> 8GB RAM configured for RM
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]