[
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388443#comment-14388443
]
Varun Saxena commented on YARN-2962:
------------------------------------
bq. Also was wondering, should we hard code the NO_INDEX_SPLITTING logic to 4 ?
Essentially, is it always guaranteed that sequence number will always be
exactly 4 digits ?
Yes it is not always guaranteed to be 4. It is minimum 4 digits and can go upto
limit of integer which would be 10 digits.
But we have another configuration about maximum number of apps in state store
which is by default 10000. So, effectively there wont be any more than this
number of apps in state store. That is why I considered split index to be
maximum 4. Also it is the simplest way of configuration. Considering upto 10
digits and having split index as > 4 would cause issues. And I could not think
of a simpler config.
But thanks for pointing this out. Actually brought my attention towards an
important issue. I think split index should carry out the split from the end of
the sequence number.
Let us say, we have apps upto {{application_\{cluster_timestamp\}_9999}}. Next
app would be {{application\_\{cluster_timestamp\}_10000}}. As max number of
apps in state store is 10000, {{application\_\{cluster_timestamp\}_0000}}
would be deleted from state store.
Consider if split index is 2. If I count split index from beginning,
applications from {{application_\{cluster_timestamp\}\_10000}} to
{{application\_\{cluster_timestamp\}\_10999}} would go under znode
{{application_\{cluster_timestamp\}\_10}} where apps from
{{application\_\{cluster_timestamp\}\_1000}} to
{{application\_\{cluster_timestamp\}\_1099}} already exist. This can make the
original problem(of exceeding jute maxbuffer size) to recur. Please note that
apps {{application_\{cluster_timestamp\}\_1000}} to
{{application_\{cluster_timestamp\}\_1099}} wont be deleted anytime soon and
this would make this znode having a lot of children.
If we split it from the end though,
{{application\_\{cluster_timestamp\}\_10000}} to
{{application\_\{cluster_timestamp\}\_10099}} would go under znode
{{application\_\{cluster_timestamp\}\_100}} instead. I think this approach is
correct instead of currently implemented one.
> ZKRMStateStore: Limit the number of znodes under a znode
> --------------------------------------------------------
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: Karthik Kambatla
> Assignee: Varun Saxena
> Priority: Critical
> Attachments: YARN-2962.01.patch
>
>
> We ran into this issue where we were hitting the default ZK server message
> size configs, primarily because the message had too many znodes even though
> they individually they were all small.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)