[ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000998#comment-16000998
 ] 

Naganarasimha G R commented on YARN-6523:
-----------------------------------------

Thanks [~jlowe] for supporting in analysis for this issue. And sorry to inform 
that due to one of the private fix in our code base we were facing this issue 
(spark JDBCserver sending different token when starting the AM and launching of 
executors later on). Did not realize it that it was only in our code we were 
sending out tokens for each app when app was getting submitted.  We are trying 
to analyze further on this issue. 
Thanks for helping out with it, but given that in Opensource we are already 
sending tokens only which is newly created as part of NM - RM communication, I 
think we do not require much further optimization right ? As Token renew which 
happens once in a day doesn't need to be updated to others and only on request 
of new token from RM(after token expires after 7 days) we need to inform other 
containers using it ?

> RM requires large memory in sending out security tokens as part of Node 
> Heartbeat in large cluster
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to