[ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995030#comment-15995030
 ] 

Jason Lowe commented on YARN-6523:
----------------------------------

Sending the full list at registration time makes a lot of sense to me, and I 
also think we can get the delta to work with some effort.  Note however that 
the delta is _per node_ not some global delta, because nodes may be 
heartbeating at drastically different times.  Therefore there isn't going to be 
a good way to build a single, pre-computed SystemCredentialsForAppsProto for 
deltas.  Each node will have to receive the app tokens that have been renewed 
since their last heartbeat, and that will be a different list than for other 
nodes in the cluster.  There will be many that will share the same delta, but 
it won't be the same for all of them.

Also note that there is going to be an interface change even with your 
proposal.  The current code assumes that the system credentials received in a 
heartbeat _replace_ the previous set of credentials.  If we suddenly start 
sending a delta in heartbeats instead of the full set then that's an 
incompatible semantic change even though the technical signature of the 
interface did not change.  Old nodemanagers during a rolling upgrade will not 
do the correct thing and apps could fail.  So minimally the RM would need to 
check the NM version and always send the full system credentials in each 
heartbeat if the NM version is "old" and only use the delta when the NM is 
beyond a certain version.

> RM requires large memory in sending out security tokens as part of Node 
> Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to