[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933544#comment-14933544 ] Hadoop QA commented on YARN-4178: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 52s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 5s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 52s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 49s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 40m 34s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764035/YARN-4178-YARN-2928.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / def22b9 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/9287/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9287/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9287/console | This message was automatically generated. > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-4178-YARN-2928.01.patch > > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933487#comment-14933487 ] Varun Saxena commented on YARN-4178: After giving some further thought over it, in the patch, I have not stored application_ in the row key and using ApplicationId for conversion. The reason is same as above. ApplicationId class is used across YARN to represent app id. Any change in app id format would reflect in this class too. If we do not use this class, we have to write our own custom conversion method and from maintenance point of view, any future changes in app id format may be missed here. If we really want to store application_ part due to the example given by Joep, maybe we can add a method in ApplicationId which takes in prefix as an additional parameter instead of taking it from static field. Thoughts ? Also, if we follow approach in the patch, conversion to ApplicationId from string can be done while filling the context itself. But as we need to refactor reader code and have common context for reader and writer, we can do this later. > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-4178-YARN-2928.01.patch > > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933506#comment-14933506 ] Varun Saxena commented on YARN-4178: Also for the sake of consistency, I have done the conversion wherever app id is used in row key even though it might not be necessary like in app to flow table. > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-4178-YARN-2928.01.patch > > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901057#comment-14901057 ] Varun Saxena commented on YARN-4178: [~jrottinghuis], No, I did not mean that we can use ApplicationId#toString to create a string which can be stored in rowkey, if that is what you meant. appid is already in that format. What I was suggesting was that on the write path, we can store only the cluster timestamp and sequence number(12 bytes - one long and one int) in the row key and skip storing the "application_" part. Storing as long and int or 2 longs would ensure correct ordering(although ascending). So, as you said above Long.MAX_VALUE - X should be used for ensuring descending order. ApplicationId#toString I was talking in context of read path. On the read path we can read these 12 bytes from row key and call ApplicationId#newInstance and ApplicationId#toString to change the timestamp and id to application_ prefix app id in string format, which can then be sent back to the client. And if prefix changes, ApplicationId will be changed as well(as it is used all over YARN). However your comment about storing application_ part in the end to make row key future proof makes sense. We can go with it. > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900927#comment-14900927 ] Joep Rottinghuis commented on YARN-4178: [~varun_saxena] if you mean o.a.h.yarn.api.records.ApplicationId then no, that will _not_ do. Its toString is defined as {code} return appIdStrPrefix + this.getClusterTimestamp() + "_" + appIdFormat.get().format(getId()); {code} The appIdFormat uses a minimum of 4 digits: fmt.setMinimumIntegerDigits(4); When the counter part wraps over to 10K or 100K or 1M (our clusters regularly run several million apps before the RM gets restarted) the sort order gets all wrong as per my comment in YARN-4074, which is why [~sangjin.park] For example, lexically application_1442351767756_1 < application_1442351767756_ We need the applications to be ordered correctly, even at those boundaries. In fact, I think we may have to store Long.MAX_VALUE - X for the timestamp and counter parts to that these will properly order in descending order for both the counter and the RM restart epoch part. The fact that all application IDs are hardcoded with application_ in yarn seems a bit silly to me. It makes much more sense to me that applications should be able to indicate an application type and that those would have a different prefix. That way one can quickly distinguish between mapreduce apps, Tez, Spark, Impala, Presto, what-have-you. This may not matter much on smaller clusters with less usage, but to make this an option for larger clusters with several tens of thousands of jobs per day this would be really really handy. Hence my suggestion to keep the application_ part at the end of the sort, to make the key-layout future proof (maybe wishful thinking in my part). > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877046#comment-14877046 ] Varun Saxena commented on YARN-4178: bq. we certainly have to store the application_ part. I think we can use ApplicationId class for it. If prefix changes, ApplicationId will change as well. As I said above we can use Application#toString to reconvert it. Wouldn't it be fair to assume that any changes in application id format will be reflected in ApplicationId class. > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877044#comment-14877044 ] Varun Saxena commented on YARN-4178: As appId is part of entity table row key, on second thoughts, containers and app attempts shouldnt be an issue. > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804423#comment-14804423 ] Vrushali C commented on YARN-4178: -- In hRaven, we started with storing hadoop job ids as a tuple of JT/RM start time and the sequence number, exactly for this reason: to maintain the right ordering. But this is good as long as the prefix for app ids is "application_". If something changes and we have a different prefix, then querying older data (older format row keys) becomes harder. Column name ordering may not be an issue, I think. For row keys, where do we see this incorrect ordering in row keys? In the applications table? But I think there is a prefix or "user!cluster!flow! flow runid! " to each row key before the application id, no? > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1480#comment-1480 ] Li Lu commented on YARN-4178: - We can rely on ApplicationId class in YARN api to fix this, right? As encapsulated in {{TimelineCollectorContext}}, shall we change the appID part into an ApplicationId typed object, and have an internal method to convert an ApplicationId to bytes for HBase storage? I suspect this is a whole flow change if we want to use ApplicationId in TimelineCollectorContext. Let's try not to break ongoing patches for this change. > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804414#comment-14804414 ] Varun Saxena commented on YARN-4178: ApplicationId is basically a combination of cluster timestamp and a monotonically increasing sequence number/id. We can hence store application id as a sequence of 2 longs or 2 ints in the row key to ensure order is maintained. We can encode it on the way in and decode it as a string on the way out by using ApplicationId#toString. We are however storing app attempts ids and container ids in the same way. They will go into the entity table. > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering
[ https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804487#comment-14804487 ] Joep Rottinghuis commented on YARN-4178: [~vrushalic] we certainly have to store the application_ part. [~gtCarrera9] for sure this can be done separate. We should do this in one fell swoop in a consistent manner across the board. If we do store three parts separately, we should probably store the epoch timestamp first, then the app counter part (integer/long) and then the application_. As far as I know it would be possible to imagine that the RM would hand out app_id's differently for Spark, or Tez, or MR or whatever the app framework asks for. I'd imagine that we then have something like application__0001, spark__0002, application__0003, tex__0004 etc. where the number still increase for each subsequent app. > [storage implementation] app id as string can cause incorrect ordering > -- > > Key: YARN-4178 > URL: https://issues.apache.org/jira/browse/YARN-4178 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > Currently the app id is used in various places as part of row keys and in > column names. However, they are treated as strings for the most part. This > will cause a problem with ordering when the id portion of the app id rolls > over to the next digit. > For example, "app_1234567890_100" will be considered *earlier* than > "app_1234567890_99". We should correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)