[
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664285#comment-16664285
]
Jason Lowe commented on YARN-6523:
----------------------------------
Thanks for updating the patch!
All PBImpl set methods must call maybeInitBuilder before , otherwise the set
method risks an NPE or the get methods could still think the value is coming
from a protocol buffer rather than the builder.
Does the registration request and response really need a token sequence number
field? I think the sequence number only needs to be associated with the
heartbeat request to let the RM know where the NM is with respect to the
credentials timeline and in the node heartbeat response so the NM knows how to
update its own concept of the credentials "timestamp." I'm not seeing how it
helps for the NM to report this in the registration request, and it seems
actively harmful in the registration response since the token sequence number
could be updated on the NM side without actually receiving the updated tokens.
Has the RM failover scenario been considered?
Arbitrary thread sleeps are a pet peeve of mine and lead to flaky an/or
unnecessarily slow unit tests. It would be good to remove the sleeps from the
unit tests making them either directly event driven rather than polled (e.g.:
through use of CountdownLatch/CyclicBarrier/etc) or use
GenericTestUtils.waitFor() with a small poll interval to wait for the necessary
condition if it has to be polled. I haven't personally run the unit test in
this patch yet, but just looking at it I counted at least 90 seconds of
sleeping which makes for a long, single test.
> Newly retrieved security Tokens are sent as part of each heartbeat to each
> node from RM which is not desirable in large cluster
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-6523
> URL: https://issues.apache.org/jira/browse/YARN-6523
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: RM
> Affects Versions: 2.8.0, 2.7.3
> Reporter: Naganarasimha G R
> Assignee: Manikandan R
> Priority: Major
> Attachments: YARN-6523.001.patch, YARN-6523.002.patch,
> YARN-6523.003.patch, YARN-6523.004.patch, YARN-6523.005.patch
>
>
> Currently as part of heartbeat response RM sets all application's tokens
> though all applications might not be active on the node. On top of it
> NodeHeartbeatResponsePBImpl converts tokens for each app into
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with
> 8GB RAM configured for RM
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]