[ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664285#comment-16664285
 ] 

Jason Lowe commented on YARN-6523:
----------------------------------

Thanks for updating the patch!

All PBImpl set methods must call maybeInitBuilder before , otherwise the set 
method risks an NPE or the get methods could still think the value is coming 
from a protocol buffer rather than the builder.

Does the registration request and response really need a token sequence number 
field?  I think the sequence number only needs to be associated with the 
heartbeat request to let the RM know where the NM is with respect to the 
credentials timeline and in the node heartbeat response so the NM knows how to 
update its own concept of the credentials "timestamp."  I'm not seeing how it 
helps for the NM to report this in the registration request, and it seems 
actively harmful in the registration response since the token sequence number 
could be updated on the NM side without actually receiving the updated tokens.

Has the RM failover scenario been considered?

Arbitrary thread sleeps are a pet peeve of mine and lead to flaky an/or 
unnecessarily slow unit tests.  It would be good to remove the sleeps from the 
unit tests making them either directly event driven rather than polled (e.g.: 
through use of CountdownLatch/CyclicBarrier/etc) or use 
GenericTestUtils.waitFor() with a small poll interval to wait for the necessary 
condition if it has to be polled.  I haven't personally run the unit test in 
this patch yet, but just looking at it I counted at least 90 seconds of 
sleeping which makes for a long, single test.


> Newly retrieved security Tokens are sent as part of each heartbeat to each 
> node from RM which is not desirable in large cluster
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Manikandan R
>            Priority: Major
>         Attachments: YARN-6523.001.patch, YARN-6523.002.patch, 
> YARN-6523.003.patch, YARN-6523.004.patch, YARN-6523.005.patch
>
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to