[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255497#comment-14255497
 ] 

Varun Saxena commented on YARN-2962:
------------------------------------

[~rakeshr], thanks for your input. ApplicationID in YARN is of the format 
{noformat}application_[cluster timestamp]_[sequence number]{noformat}
Here sequence number has 4 digits and is in the range 0000-9999. 
Going along the lines of what you are saying, I think we can break the sequence 
number part of ApplicationID as cluster timestamp will probably be same for 
most of the application IDs'. My suggestion is to have it as 
{noformat}(app_root)\application_[cluster timestamp]_\[first 2 digits of 
sequence number]\[last 2 digits]{noformat}
We can view it as under :
{noformat}
   * |--- RM_APP_ROOT
   * |     |----- (application_{cluster timestamp}_)
   * |     |        |----- (00 to 99)
   * |     |        |        |------ (00 to 99)
   * |     |        |        |         |----- (#ApplicationAttemptIds)
{noformat}

[~rakeshr] and [~kasha], kindly comment on the approach. One constraint is that 
this would entail a larger number of contacts to ZK when RM is recovering.
I am not sure how many znodes can lead to reaching limit of 1 MB. We can break 
sequence number as 1 digit and last 3 digit as well.

Moreover, I dont see much of an issue with application attempt znodes as 
max-attempts by default are limited to 2. 

> ZKRMStateStore: Limit the number of znodes under a znode
> --------------------------------------------------------
>
>                 Key: YARN-2962
>                 URL: https://issues.apache.org/jira/browse/YARN-2962
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Karthik Kambatla
>            Assignee: Varun Saxena
>            Priority: Critical
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to