[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

Joep Rottinghuis (JIRA) Mon, 21 Sep 2015 09:26:34 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900927#comment-14900927
 ]


Joep Rottinghuis commented on YARN-4178:
----------------------------------------

[~varun_saxena] if you mean o.a.h.yarn.api.records.ApplicationId then no, that 
will _not_ do.
Its toString is defined as
{code}
return appIdStrPrefix + this.getClusterTimestamp() + "_"
        + appIdFormat.get().format(getId());
{code}
The appIdFormat uses a minimum of 4 digits: fmt.setMinimumIntegerDigits(4);
When the counter part wraps over to 10K or 100K or 1M (our clusters regularly 
run several million apps before the RM gets restarted) the sort order gets all 
wrong as per my comment in YARN-4074, which is why [~sangjin.park]

For example, lexically application_1442351767756_10000 < 
application_1442351767756_9999
We need the applications to be ordered correctly, even at those boundaries.

In fact, I think we may have to store Long.MAX_VALUE - X for the timestamp and 
counter parts to that these will properly order in descending order for both 
the counter and the RM restart epoch part.

The fact that all application IDs are hardcoded with application_ in yarn seems 
a bit silly to me. It makes much more sense to me that applications should be 
able to indicate an application type and that those would have a different 
prefix. That way one can quickly distinguish between mapreduce apps, Tez, 
Spark, Impala, Presto, what-have-you.
This may not matter much on smaller clusters with less usage, but to make this 
an option for larger clusters with several tens of thousands of jobs per day 
this would be really really handy. Hence my suggestion to keep the application_ 
part at the end of the sort, to make the key-layout future proof (maybe wishful 
thinking in my part).


> [storage implementation] app id as string can cause incorrect ordering
> ----------------------------------------------------------------------
>
>                 Key: YARN-4178
>                 URL: https://issues.apache.org/jira/browse/YARN-4178
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

Reply via email to