[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

Jason Lowe (JIRA) Tue, 04 Dec 2018 07:29:07 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708868#comment-16708868
 ]


Jason Lowe commented on YARN-6523:
----------------------------------

Thanks for updating the patch!   If a unit test just added in a patch fails in 
the precommit build then there's usually something wrong with the test even if 
it passes locally.  It's likely to be a racy test, as the precommit builds are 
notorious for running unit tests with a different timing than seen locally.

The problem with these tests is they still aren't really unit tests but rather 
integration tests where it is spinning up an RM and an NM.  The first test 
should only create a DelegationTokenRenewer with a mock RMContext and verify 
that RMContext#incrTokenSequenceNo is called when the appropriate token is 
created and when it is renewed.  No server start ups, heartbeats, etc.  All of 
that tends to be racy as async dispatchers are usually involved making it hard 
to know when something is done processing and therefore safe to examine for 
assertions.  DelegationTokenRenewer#addApplicationSync can be used to test the 
case where a token is created, and we can make 
DelegationTokenRenewer#requestNewHdfsDelegationTokenIfNeeded package-private so 
we can call it from a test with a token that needs to be renewed to test the 
renewal case.

The second test is designed to test the ResourceTrackerService is properly 
handling the token sequence number, so there should be a unit test that 
verifies that the system credentials are sent when the token sequence number 
mismatches and not sent when they match.  That test should be in 
TestResourceTrackerService, since that's what we're testing.  If we pass a mock 
RMContext to the ResourceTrackerService when we construct it for the test, it 
makes it easy to manipulate it, along with the credentials payload, to verify 
in the test that the credentials are only sent when expected. 

NodeHeartbeatResponse should get/set a Collection rather than a List.  That 
allows ResourceTrackerService to pass the values of its tracking map directly 
rather than needing to convert it into a list first.

Typo in NodeHeartbeatResponse comment: "logAggreations"

NodeHeartbeatResponsePBImpl#setSystemCredentialsForApps should pass the 
collection directly to the ArrayList constructor so it doesn't have to guess at 
the initial size of the array then immediately discard it to reallocate a new 
one when the collection is larger than the initial guess.  Passing directly to 
the constructor allows ArrayList to allocate the correct array size the first 
time and reduces unnecessary garbage.

Nit: The name "systemCredentialsForAppsProto" in NodeHeartbeatResponsePBImpl 
implies it is a single proto rather than a collection of multiple.  Maybe just 
"systemCredentials"?

YarnServerBuilderUtils should pass the desired capacity to the ArrayList or 
HashMap constructor since it's trivial to compute and eliminates the 
possibility of needing to resize the collection due to a poor initial guess in 
the default constructor.


> Newly retrieved security Tokens are sent as part of each heartbeat to each 
> node from RM which is not desirable in large cluster
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6523
>                 URL: https://issues.apache.org/jira/browse/YARN-6523
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: RM
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Naganarasimha G R
>            Assignee: Manikandan R
>            Priority: Major
>         Attachments: YARN-6523.001.patch, YARN-6523.002.patch, 
> YARN-6523.003.patch, YARN-6523.004.patch, YARN-6523.005.patch, 
> YARN-6523.006.patch, YARN-6523.007.patch, YARN-6523.008.patch, 
> YARN-6523.009.patch
>
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster

Reply via email to