Varun Saxena commented on YARN-2962:

[~rakeshr], thanks for your input. ApplicationID in YARN is of the format 
{noformat}application_[cluster timestamp]_[sequence number]{noformat}
Here sequence number has 4 digits and is in the range 0000-9999. 
Going along the lines of what you are saying, I think we can break the sequence 
number part of ApplicationID as cluster timestamp will probably be same for 
most of the application IDs'. My suggestion is to have it as 
{noformat}(app_root)\application_[cluster timestamp]_\[first 2 digits of 
sequence number]\[last 2 digits]{noformat}
We can view it as under :
   * |--- RM_APP_ROOT
   * |     |----- (application_{cluster timestamp}_)
   * |     |        |----- (00 to 99)
   * |     |        |        |------ (00 to 99)
   * |     |        |        |         |----- (#ApplicationAttemptIds)

[~rakeshr] and [~kasha], kindly comment on the approach. One constraint is that 
this would entail a larger number of contacts to ZK when RM is recovering.
I am not sure how many znodes can lead to reaching limit of 1 MB. We can break 
sequence number as 1 digit and last 3 digit as well.

Moreover, I dont see much of an issue with application attempt znodes as 
max-attempts by default are limited to 2. 

> ZKRMStateStore: Limit the number of znodes under a znode
> --------------------------------------------------------
>                 Key: YARN-2962
>                 URL: https://issues.apache.org/jira/browse/YARN-2962
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Karthik Kambatla
>            Assignee: Varun Saxena
>            Priority: Critical
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.

This message was sent by Atlassian JIRA

Reply via email to