[ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684104#comment-16684104
 ] 

Manikandan R commented on YARN-6523:
------------------------------------

{quote}Does the registration request and response really need a token sequence 
number field? {quote}

Added token sequence no only in Registration response. Thought it would be more 
cleaner approach to have the sequence no upfront and pass as part of first node 
heartbeat itself. Anyways, removed now so that NM's StatusUpdaterImpl pass 0 in 
first heartbeat request and from then it would get set based on value received 
as part of node heartbeat response from RM.

{quote}Has the RM failover scenario been considered?{quote}

Since RMContext has tokenSeqeunceNo and initialised to 1 during the start, in 
cases of any restart it would again initialised to 1 and after all NM's 
re-registration process, each NM's first node heartbeat response would be 
having credentials for sure as there would be difference in value.

Taken care of all other comments. Attaching patch for review.



> Newly retrieved security Tokens are sent as part of each heartbeat to each 
> node from RM which is not desirable in large cluster
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Manikandan R
>            Priority: Major
>         Attachments: YARN-6523.001.patch, YARN-6523.002.patch, 
> YARN-6523.003.patch, YARN-6523.004.patch, YARN-6523.005.patch
>
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to