[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933544#comment-14933544
 ] 

Hadoop QA commented on YARN-4178:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 52s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  5s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 52s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 49s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  40m 34s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764035/YARN-4178-YARN-2928.01.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / def22b9 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9287/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9287/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9287/console |


This message was automatically generated.

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch
>
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-28 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933487#comment-14933487
 ] 

Varun Saxena commented on YARN-4178:


After giving some further thought over it, in the patch, I have not stored 
application_ in the row key and using ApplicationId for conversion.
The reason is same as above. ApplicationId class is used across YARN to 
represent app id. Any change in app id format would reflect in this class too.
If we do not use this class, we have to write our own custom conversion method 
and from maintenance point of view, any future changes in app id format may be 
missed here.
If we really want to store application_ part due to the example given by Joep, 
maybe we can add a method in ApplicationId which takes in prefix as an 
additional parameter instead of taking it from static field. 
Thoughts ?

Also, if we follow approach in the patch, conversion to ApplicationId from 
string can be done while filling the context itself. But as we need to refactor 
reader code and  have common context for reader and writer, we can do this 
later.

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch
>
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-28 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933506#comment-14933506
 ] 

Varun Saxena commented on YARN-4178:


Also for the sake of consistency, I have done the conversion wherever app id is 
used in row key even though it might not be necessary like in app to flow table.

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch
>
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-21 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901057#comment-14901057
 ] 

Varun Saxena commented on YARN-4178:


[~jrottinghuis],
No, I did not mean that we can use ApplicationId#toString to create a string 
which can be stored in rowkey, if that is what you meant. appid is already in 
that format.

What I was suggesting was that on the write path, we can store only the cluster 
timestamp and sequence number(12 bytes - one long and one int) in the row key 
and skip storing the "application_" part. Storing as long and int or 2 longs 
would ensure correct ordering(although ascending). So, as you said above 
Long.MAX_VALUE - X should be used for ensuring descending order.
ApplicationId#toString I was talking in context of read path. On the read path 
we can read these 12 bytes from row key and call ApplicationId#newInstance and 
ApplicationId#toString to change the timestamp and id to application_ prefix 
app id in string format, which can then be sent back to the client. And if 
prefix changes, ApplicationId will be changed as well(as it is used all over 
YARN).

However your comment about storing application_ part in the end to make row key 
future proof makes sense. We can go with it.

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-21 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900927#comment-14900927
 ] 

Joep Rottinghuis commented on YARN-4178:


[~varun_saxena] if you mean o.a.h.yarn.api.records.ApplicationId then no, that 
will _not_ do.
Its toString is defined as
{code}
return appIdStrPrefix + this.getClusterTimestamp() + "_"
+ appIdFormat.get().format(getId());
{code}
The appIdFormat uses a minimum of 4 digits: fmt.setMinimumIntegerDigits(4);
When the counter part wraps over to 10K or 100K or 1M (our clusters regularly 
run several million apps before the RM gets restarted) the sort order gets all 
wrong as per my comment in YARN-4074, which is why [~sangjin.park]

For example, lexically application_1442351767756_1 < 
application_1442351767756_
We need the applications to be ordered correctly, even at those boundaries.

In fact, I think we may have to store Long.MAX_VALUE - X for the timestamp and 
counter parts to that these will properly order in descending order for both 
the counter and the RM restart epoch part.

The fact that all application IDs are hardcoded with application_ in yarn seems 
a bit silly to me. It makes much more sense to me that applications should be 
able to indicate an application type and that those would have a different 
prefix. That way one can quickly distinguish between mapreduce apps, Tez, 
Spark, Impala, Presto, what-have-you.
This may not matter much on smaller clusters with less usage, but to make this 
an option for larger clusters with several tens of thousands of jobs per day 
this would be really really handy. Hence my suggestion to keep the application_ 
part at the end of the sort, to make the key-layout future proof (maybe wishful 
thinking in my part).


> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877046#comment-14877046
 ] 

Varun Saxena commented on YARN-4178:


bq. we certainly have to store the application_ part.
I think we can use ApplicationId class for it. If prefix changes, ApplicationId 
will change as well. As I said above we can use Application#toString to 
reconvert it. Wouldn't it be fair to assume that any changes in application id 
format will be reflected in ApplicationId class.

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877044#comment-14877044
 ] 

Varun Saxena commented on YARN-4178:


As appId is part of entity table row key, on second thoughts, containers and 
app attempts shouldnt be an issue.

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804423#comment-14804423
 ] 

Vrushali C commented on YARN-4178:
--


In hRaven, we started with storing hadoop job ids as a tuple of JT/RM start 
time and the sequence number, exactly for this reason: to maintain the right 
ordering. 

But this is good as long as the prefix for app ids is "application_". If 
something changes and we have a different prefix, then querying older data 
(older format row keys) becomes harder. 

Column name ordering may not be an issue, I think.

For row keys, where do we see this incorrect ordering in row keys? In the 
applications table? But I think there is a prefix or "user!cluster!flow! flow 
runid! " to each row key before the application id, no? 



> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1480#comment-1480
 ] 

Li Lu commented on YARN-4178:
-

We can rely on ApplicationId class in YARN api to fix this, right? As 
encapsulated in {{TimelineCollectorContext}}, shall we change the appID part 
into an ApplicationId typed object, and have an internal method to convert an 
ApplicationId to bytes for HBase storage? I suspect this is a whole flow change 
if we want to use ApplicationId in TimelineCollectorContext. Let's try not to 
break ongoing patches for this change. 

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804414#comment-14804414
 ] 

Varun Saxena commented on YARN-4178:


ApplicationId is basically a combination of cluster timestamp and a 
monotonically increasing sequence number/id. 
We can hence store application id as a sequence of 2 longs or 2 ints in the row 
key to ensure order is maintained.

We can encode it on the way in and decode it as a string on the way out by 
using ApplicationId#toString.

We are however storing app attempts ids and container ids in the same way. They 
will go into the entity table.



> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804487#comment-14804487
 ] 

Joep Rottinghuis commented on YARN-4178:


[~vrushalic] we certainly have to store the application_ part.
[~gtCarrera9] for sure this can be done separate. We should do this in one fell 
swoop in a consistent manner across the board.

If we do store three parts separately, we should probably store the epoch 
timestamp first, then the app counter part (integer/long) and then the 
application_.
As far as I know it would be possible to imagine that the RM would hand out 
app_id's differently for Spark, or Tez, or MR or whatever the app framework 
asks for. I'd imagine that we then have something like 
application__0001, spark__0002, 
application__0003, tex__0004 etc. where the 
number still increase for each subsequent app.

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)