[
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526289#comment-14526289
]
Arun Suresh commented on YARN-2962:
-----------------------------------
bq. we can simplify it without special hierarchies by having RM recursively
read nodes all the time. As we already specifically look for "application_"
prefix to read app-date, having application_1234_0456 right next to a 00/
directory is simply going to work without much complexity
Hmmm.. Currently, while reading, the code expects only leaf nodes to have data.
We could modify it to continue to child nodes while loading RMState. But
updates to an app state would require some thought. Consider updating state of
app Id _1000000. The update code would have to first check both the /.._1000000
and /.._10000/00 znodes. Also, retrieving state during load_all and
update_single might be hairy.. there can be ambiguous paths since a node path
might not be unique across the 2 schemes. For eg. /.._10000 will exist both in
the new and old scheme. In the old scheme it can contain data, but in the new
scheme it shouldnt (it is an intermediate node for /.._10000/\[00-99\])..
Although option 2 can be done, I'd prefer the your first suggestion (storing
under RM_APP_ROOT/hierarchies). We can have the RM read the old style but new
apps and updates to old apps will go under the new root. We can even delete the
old scheme root if no children exist.
> ZKRMStateStore: Limit the number of znodes under a znode
> --------------------------------------------------------
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: Karthik Kambatla
> Assignee: Varun Saxena
> Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message
> size configs, primarily because the message had too many znodes even though
> they individually they were all small.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)