[jira] [Commented] (YARN-6523) RM requires large memory in sending out security tokens as part of Node Heartbeat in large cluster

Jason Lowe (JIRA) Wed, 26 Apr 2017 07:35:30 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984919#comment-15984919
 ]


Jason Lowe commented on YARN-6523:
----------------------------------

I don't know the full story behind the SystemCredentialsForApps thing.  Looks 
like something that was put in for Slider and other long-running services where 
the initial tokens can expire.  It would be good to get input from [~vinodkv] 
and [~jianhe] since they were more involved in this.

I agree it seems silly for every node in the cluster to get _all_ apps HDFS 
credentials on _every heartbeat_.  I suspect this was the simplest thing to 
implement, but it's far from efficient.  Going to the other extreme of just 
sending the app credentials only once for just the apps that could be active on 
the node is a lot more complicated.  It's true that RMNodeImpl is tracking what 
applications are on the node, but this is _reactive_ tracking to what the node 
is already doing.  There are some scenarios where the updated tokens need to be 
on the node _before_ the container launch request arrives at the node and 
therefore the app becomes active in the node's RMNodeImpl.  For example, a 
Slider app runs for months.  The initial tokens at app submit time have long 
expired, so the RM has had to re-fetch the tokens.  Then suddenly the Slider 
app wants to launch a container on a node it's never touched before.  The 
node's RMNodeImpl doesn't know the app is active until a container starts 
running on it, but the container can't localize without the updated tokens that 
the node has never received yet.  So we'd need to send the credentials when the 
scheduler allocates an app's container on the node for the first time and then 
also when any of the app's credentials are updated (e.g.: when a token is 
replaced with a refreshed version).  And then there's handling lost heartbeats, 
node reconnect, etc.  In short, efficient delta is a lot more complicated.

Rather than going straight to the complicated, fully optimal implementation we 
could do something in-between.  For example, we could have a sequence number 
associated with the system credentials.  Nodes would send the last sequence 
number that they have received, and if it matches the current sequence number 
then the RM does _not_ send them in the heartbeat response.  If the sequence 
numbers don't match then the RM sends the current sequence number along with 
the system credentials.  It's still sending all the credentials instead of 
optimal deltas, but at least they're only being sent when the node needs the 
updated version.  And yes, we should precompute the 
SystemCredentialsForAppsProto once when the credentials change and re-send the 
same object to any node that needs the updated credentials rather than recreate 
the same object over and over and over.  That should drastically cut down on 
the number of objects related to system credentials in heartbeats and how often 
we're sending them.


> RM requires large memory in sending out security tokens as part of Node 
> Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-6523) RM requires large memory in sending out security tokens as part of Node Heartbeat in large cluster

Reply via email to