[
https://issues.apache.org/jira/browse/YARN-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191351#comment-16191351
]
Daniel Templeton commented on YARN-7262:
----------------------------------------
My comments after a closer look:
# The new property and default should have javadocs
# Please don't you start with the {{null != x}} stuff, too... \*sigh\*
# Please add assert messages, and let's not mix {{assertX()}} with
{{Assert.assertX()}} calls.
# I feet like you should test with more than just a split index of 0 or 1.
# You don't need to store {{token3}} in
{{testDelegationTokenNodeWithSplitChangeAcrossRestarts()}}.
# In {{initInternal()}}. shouldn't you consider 0 a valid split index?
# "Unknown child node with name: " could be a bit more descriptive. Child of
what? What caused it? What should the admin do about it? Same for the
messages in {{checkRemoveParentZnode()}}
# In {{loadDelegationTokenFromNode()}}, can I get an _else_ instead of an early
return?
# I don't like reassigning the {{splitIdx}} parameter in {{getLeafZnodePath()}}.
# May as well split the long line on the equals in {{RMStateStore}}.
I still want to take a closer look at the ZK code, but I need more sleep first.
> Add a hierarchy into the ZKRMStateStore for delegation token znodes to
> prevent jute buffer overflow
> ---------------------------------------------------------------------------------------------------
>
> Key: YARN-7262
> URL: https://issues.apache.org/jira/browse/YARN-7262
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.6.0
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: YARN-7262.001.patch
>
>
> We've seen users who are running into a problem where the RM is storing so
> many delegation tokens in the {{ZKRMStateStore}} that the _listing_ of those
> znodes is higher than the jute buffer. This is fine during operations, but
> becomes a problem on a fail over because the RM will try to read in all of
> the token znodes (i.e. call {{getChildren}} on the parent znode). This is
> particularly bad because everything appears to be okay, but then if a
> failover occurs you end up with no active RMs.
> There was a similar problem with the Yarn application data that was fixed in
> YARN-2962 by adding a (configurable) hierarchy of znodes so the RM could pull
> subchildren without overflowing the jute buffer (though it's off by default).
> We should add a hierarchy similar to that of YARN-2962, but for the
> delegation token znodes.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]