[jira] [Commented] (YARN-3984) Rethink event column key issue

2016-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369717#comment-15369717
 ] 

Hudson commented on YARN-3984:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10074 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10074/])
YARN-3984. Adjusted the event column key schema and avoided missing (sjlee: rev 
9422d9b50d90a99062880cf648dd86a764bf97ec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/TimelineWriterUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityRowKey.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestHBaseTimelineWriterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineWriterImpl.java


> Rethink event column key issue
> --
>
> Key: YARN-3984
> URL: https://issues.apache.org/jira/browse/YARN-3984
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Vrushali C
> Fix For: YARN-2928
>
> Attachments: YARN-3984-YARN-2928.001.patch
>
>
> Currently, the event column key is event_id?info_key?timestamp, which is not 
> so friendly to fetching all the events of an entity and sorting them in a 
> chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
> schema. I open this jira to continue the discussion about it which was 
> commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-08-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659183#comment-14659183
 ] 

Zhijie Shen commented on YARN-3984:
---

bq. If the info map is not empty, this record would be redundant and will take 
up storage space.

Make sense. The patch looks good to me. Will commit it.

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928

 Attachments: YARN-3984-YARN-2928.001.patch


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650105#comment-14650105
 ] 

Hadoop QA commented on YARN-3984:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 52s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 48s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  38m 46s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748270/YARN-3984-YARN-2928.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / df0ec47 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8744/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8744/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8744/console |


This message was automatically generated.

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928

 Attachments: YARN-3984-YARN-2928.001.patch


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-31 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649598#comment-14649598
 ] 

Zhijie Shen commented on YARN-3984:
---

Sure, it's fine too. One question, do we make sure every event has such a 
column or only the event without info has it? Personally, I prefer the former 
option, which makes the process of the event uniformed.

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-31 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649601#comment-14649601
 ] 

Vrushali C commented on YARN-3984:
--

bq. One question, do we make sure every event has such a column or only the 
event without info has it? 

Yes, I was thinking of doing this for every event id/timestamp. 

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-31 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650014#comment-14650014
 ] 

Li Lu commented on YARN-3984:
-

Hi [~vrushalic], thanks for the patch. The current patch LGTM (pending 
Jenkins). Just to double check that we need to firstly grab the event for the 
case its info map is empty, and then if it fails we start to read all info 
keys? Thanks! 

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928

 Attachments: YARN-3984-YARN-2928.001.patch


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-30 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648540#comment-14648540
 ] 

Sangjin Lee commented on YARN-3984:
---

To be clear, with the latter option, if we want to look for an event by id, we 
can use {{ColumnPrefixFilter}} for {{e! eventId}}, right? So in that case we 
won't need to fetch all columns, correct?

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648521#comment-14648521
 ] 

Li Lu commented on YARN-3984:
-

Thanks! I think I'm leaning towards to eventid#inverse_event_timestamp?eventKey 
then, if we have to do the sorting in memory anyways. 

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648516#comment-14648516
 ] 

Vrushali C commented on YARN-3984:
--

If the query has the exact timestamp as well event id, then we can. But for 
queries like Give me information about CONTAINER KILLED events for this 
application, we won't be able to return this information without querying for 
all events in this application. 

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648491#comment-14648491
 ] 

Vrushali C commented on YARN-3984:
--

To reach a conclusion on this: 
If everyone/most folks are +1 for putting the event timestamp before the event 
id itself {code} e! inverse_event_timestamp # eventid ? eventkey {code} I can 
go ahead and create the patch.
Note that by doing so, we will *always* have to query for all event ids and all 
timestamps regardless of the query (unless we know the exact timestamp).

If not, the other option is to put the event timestamp after the event id but 
before the event key.{code} e! eventid # inverse_event_timestamp ? eventkey 
{code}
In this option, we have the option of querying for a particular event id. 

In both cases, we need to fetch all records, construct TimelineEvent objects 
and sort them for chronological order. 




 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648499#comment-14648499
 ] 

Li Lu commented on YARN-3984:
-

Hi [~vrushalic], one quick question. I'm a little bit confused by this:
bq. This would mean that we would never be able to query for a specific event.

Maybe here you're assuming that the timestamp information is missing for some 
of our use cases? Or else, because timestamp is one of the two parts of the id 
of timeline event, I'm not sure why we cannot directly locate that specific 
column? 

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648428#comment-14648428
 ] 

Vrushali C commented on YARN-3984:
--

Hi Zhijie,

bq. the current query we want to support now (in YARN-3051 and YARN-3049) is to 
retrieve all events belonging to an entity (e.g. application, attempt, 
container and etc.).
Yes, fetch all events query is supported with all types of row key designs. 
Fetching all events is not affected by the row key order. The reader would 
construct a set /list of TimelineEvents in any case and then sort them in the 
code. The timestamp will help in ordering but you don't know when to stop the 
scan, so all events belonging to all timestamps have to be fetched and sorting 
and filtering out latest events has to be done in the code in any case when we 
fetch all events. 

bq.  In this case, the most efficient way is to put timestamp even before the 
event ID, so that we don't need to order the events in memory
This would mean that we would *never* be able to query for a specific event. We 
would *always* have to fetch all events belonging to all timestamps and perform 
client side filtering. 

I see the point about the info map being empty/null. I will add a case to store 
event id and timestamp when the info map is null. 



 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648605#comment-14648605
 ] 

Vrushali C commented on YARN-3984:
--

Yes, right. Given an event id,  in the case the column key is {code}  e! 
eventid # inverse_event_timestamp ? eventkey  {code}, we can query for a 
particular event.




 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646924#comment-14646924
 ] 

Zhijie Shen commented on YARN-3984:
---

[~vrushalic], thanks for picking it up. The aforementioned cases are definitely 
good to support, while the current query we want to support now (in YARN-3051 
and YARN-3049) is to retrieve all events belonging to an entity (e.g. 
application, attempt, container and etc.). With this basic query, we can easily 
distill the details that happen to the entity, such as the diagnostic msg of 
the kill event. In this case, the most efficient way is to put timestamp even 
before the event ID, so that we don't need to order the events in memory.

In addition to the key composition, I find another significant problem with the 
event store schema. If the event doesn't contain any info, it will be ignored 
then. And we cannot always guarantee user will put something into info. For 
example, user may define a KILL event without any diagnostic msg.

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646930#comment-14646930
 ] 

Sangjin Lee commented on YARN-3984:
---

{quote}
In addition to the key composition, I find another significant problem with the 
event store schema. If the event doesn't contain any info, it will be ignored 
then. And we cannot always guarantee user will put something into info. For 
example, user may define a KILL event without any diagnostic msg.
{quote}

Thanks for spotting that issue [~zjshen]. That's definitely a huge issue. We 
should address that as part of this JIRA...

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646935#comment-14646935
 ] 

Zhijie Shen commented on YARN-3984:
---

In fact, metric has the same problem, but it may be still okay to ignore a 
metric without any data.

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646940#comment-14646940
 ] 

Li Lu commented on YARN-3984:
-

Yes. This defect is blocking YARN-3049. Linking the two JIRAs together. 

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646942#comment-14646942
 ] 

Li Lu commented on YARN-3984:
-

See [~zjshen]'s comments as well as the discussions in YARN-3049 for more 
details about the current limitation of our HBaseWriter on TimelineEvents. 

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-27 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643758#comment-14643758
 ] 

Vrushali C commented on YARN-3984:
--

I can take this up. Please feel free to reassign or if someone else wants it, 
please let me know on the jira and we can redistribute the jira. 

To add to my previous comment, let's take an example. Say event id is KILLED 
and it occurs 3 times for whatever reason. Now let's say: 
at ts1, for key DIAGNOSTICS, the value is xyz. 
at ts1, for key SOMETHING ELSE, the value is something
at ts2, for key DIAGNOSTICS, the value is abc 
at ts3, for key DIAGNOSTICS, the value is pqr
at ts3, for key SOMETHING ELSE, the value is something even more

where ts1  ts2  ts3. So ts3 is the most recent timestamp.

Now which of the queries is the most commonly required:
- for this application, what is the diagnostic message for the most recent 
KILLED event id? Or all of the diagnostics in KILLED id?
- for this application, what is the most recent key(s) in the KILLED event id ?
- for this application, what are the keys ( values)  that occurred between ts2 
and ts3 for KILLED event id? 

If we think #2 and #3 are the most commonly run queries, then we can go with 
timestamp before the key.
If we think #1 is the most commonly run query, then we can go with key before 
timestamp. 

Now if we choose timestamp before key, then we can never pull back the value 
given an event and a key without fetching all keys in that event for all 
timestamps. 

If we choose key before timestamp, we cant easily pull back most recently 
occurred key within an event. 

In any case, we can't know which event was the most recent in the application. 
For example, in this case, INITED event record will be stored before KILLED 
event record since I  K and hbase will sort it lexicographically.

So we are interested in knowing which event itself occurred the most recent, 
then we need to fetch all events (along with event keys and timestamps) and 
sort by timestamp and then return the most recent event.


 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-27 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643746#comment-14643746
 ] 

Vrushali C commented on YARN-3984:
--

Thanks for filing the jira [~zjshen]. The key design will be based on the 
query/access pattern that we think is most relevant to this information. How do 
we envision this event information being accessed? Given an event id, do we see 
querying for most recent keys within an event as the primary access pattern? Or 
any time range based queries in fact. If yes, then putting the timestamp 
*before* the event key will be better. If the primary access pattern will be 
based on the name of the key in the event id, then putting the timestamp 
*after* the event key will make it work better. 

Do you have any example queries/access requests in mind? How was this 
information queried for in ATSv1? Who might be wanting this information? I 
think these questions will help us arrive at a solution. 

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)