[ 
https://issues.apache.org/jira/browse/YARN-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-7262:
--------------------------------
    Attachment: YARN-7262.002.patch

Thanks for the feedback [~templedf].  
I've had a chance to actually use it in a real cluster and everything looks 
good.

{quote}The new property and default should have javadocs{quote}
It is documented in yarn-default.xml and most of the other properties in 
{{YarnConfiguration}} don't have Javadocs.

The 002 patch:
- I changed my {{null !=}} - that's what I get for copy-pasting existing code.
- Replaced all {{Assert.assertX}} with simply {{assertX}}
- Added messages to some assert statements
- Added tests for split index 2, 3, and 4.
- No longer stores {{token3}}
- {{initInternal}} now considers 0 a valid value.  I also fixed that for the 
app split index config.
- Made the "Unknown child node with name" message more descriptive, moved it to 
the debug level, and updated it to not erroneously complain about the "1", "2", 
"3", and "4" znodes.  I also made similar improvements for the similar code 
used for app spliting.
- Updated {{loadDelegationTokenFromNode}} to use {{else}} instead of early 
{{return}}
- Introduced a new variable in {{getLeafZnodePath}} instead of reusing 
{{splitIdx}}
- Split the long line in {{RMStateStore}}

> Add a hierarchy into the ZKRMStateStore for delegation token znodes to 
> prevent jute buffer overflow
> ---------------------------------------------------------------------------------------------------
>
>                 Key: YARN-7262
>                 URL: https://issues.apache.org/jira/browse/YARN-7262
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: YARN-7262.001.patch, YARN-7262.002.patch
>
>
> We've seen users who are running into a problem where the RM is storing so 
> many delegation tokens in the {{ZKRMStateStore}} that the _listing_ of those 
> znodes is higher than the jute buffer. This is fine during operations, but 
> becomes a problem on a fail over because the RM will try to read in all of 
> the token znodes (i.e. call {{getChildren}} on the parent znode).  This is 
> particularly bad because everything appears to be okay, but then if a 
> failover occurs you end up with no active RMs.
> There was a similar problem with the Yarn application data that was fixed in 
> YARN-2962 by adding a (configurable) hierarchy of znodes so the RM could pull 
> subchildren without overflowing the jute buffer (though it's off by default).
> We should add a hierarchy similar to that of YARN-2962, but for the 
> delegation token znodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to