[ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994995#comment-15994995
 ] 

Naganarasimha G R commented on YARN-6523:
-----------------------------------------

Sorry for the delay in response [~jlowe],
Thanks for the very detailed response. Agree that the delta approaches 
initially mentioned can introduce certain amount of complexity in the cases 
mentioned by you.
Though initially the approach mentioned by you was appealing and less 
complicated, i was thinking of following scenarios :
# When there are large number of small jobs in a large clsuter we almost send 
the tokens as the sequence keeps increasing when more and more jobs get 
submitted.
# Well we are doing interface modification, so it would be better to go for 
complete solution so that its not revisited again for deprecation.

One other approach which i can think of is : Send all the tokens during node 
registration ( This will avoid most of the corner cases) and as part of 
heartbeat send the app tokens(all) which have been renewed (which can be done 
in event based model). Further we can have the cache(pre-computed) of 
SystemCredentialsForAppsProto which are sent as part of Heart Beat so that we 
reduce memory foot print. thus this approach would solve large number of small 
jobs too without interface change. thoughts ?

> RM requires large memory in sending out security tokens as part of Node 
> Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to