[
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708868#comment-16708868
]
Jason Lowe commented on YARN-6523:
----------------------------------
Thanks for updating the patch! If a unit test just added in a patch fails in
the precommit build then there's usually something wrong with the test even if
it passes locally. It's likely to be a racy test, as the precommit builds are
notorious for running unit tests with a different timing than seen locally.
The problem with these tests is they still aren't really unit tests but rather
integration tests where it is spinning up an RM and an NM. The first test
should only create a DelegationTokenRenewer with a mock RMContext and verify
that RMContext#incrTokenSequenceNo is called when the appropriate token is
created and when it is renewed. No server start ups, heartbeats, etc. All of
that tends to be racy as async dispatchers are usually involved making it hard
to know when something is done processing and therefore safe to examine for
assertions. DelegationTokenRenewer#addApplicationSync can be used to test the
case where a token is created, and we can make
DelegationTokenRenewer#requestNewHdfsDelegationTokenIfNeeded package-private so
we can call it from a test with a token that needs to be renewed to test the
renewal case.
The second test is designed to test the ResourceTrackerService is properly
handling the token sequence number, so there should be a unit test that
verifies that the system credentials are sent when the token sequence number
mismatches and not sent when they match. That test should be in
TestResourceTrackerService, since that's what we're testing. If we pass a mock
RMContext to the ResourceTrackerService when we construct it for the test, it
makes it easy to manipulate it, along with the credentials payload, to verify
in the test that the credentials are only sent when expected.
NodeHeartbeatResponse should get/set a Collection rather than a List. That
allows ResourceTrackerService to pass the values of its tracking map directly
rather than needing to convert it into a list first.
Typo in NodeHeartbeatResponse comment: "logAggreations"
NodeHeartbeatResponsePBImpl#setSystemCredentialsForApps should pass the
collection directly to the ArrayList constructor so it doesn't have to guess at
the initial size of the array then immediately discard it to reallocate a new
one when the collection is larger than the initial guess. Passing directly to
the constructor allows ArrayList to allocate the correct array size the first
time and reduces unnecessary garbage.
Nit: The name "systemCredentialsForAppsProto" in NodeHeartbeatResponsePBImpl
implies it is a single proto rather than a collection of multiple. Maybe just
"systemCredentials"?
YarnServerBuilderUtils should pass the desired capacity to the ArrayList or
HashMap constructor since it's trivial to compute and eliminates the
possibility of needing to resize the collection due to a poor initial guess in
the default constructor.
> Newly retrieved security Tokens are sent as part of each heartbeat to each
> node from RM which is not desirable in large cluster
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-6523
> URL: https://issues.apache.org/jira/browse/YARN-6523
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: RM
> Affects Versions: 2.8.0, 2.7.3
> Reporter: Naganarasimha G R
> Assignee: Manikandan R
> Priority: Major
> Attachments: YARN-6523.001.patch, YARN-6523.002.patch,
> YARN-6523.003.patch, YARN-6523.004.patch, YARN-6523.005.patch,
> YARN-6523.006.patch, YARN-6523.007.patch, YARN-6523.008.patch,
> YARN-6523.009.patch
>
>
> Currently as part of heartbeat response RM sets all application's tokens
> though all applications might not be active on the node. On top of it
> NodeHeartbeatResponsePBImpl converts tokens for each app into
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with
> 8GB RAM configured for RM
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]