Vrushali C commented on YARN-3984:

I can take this up. Please feel free to reassign or if someone else wants it, 
please let me know on the jira and we can redistribute the jira. 

To add to my previous comment, let's take an example. Say event id is KILLED 
and it occurs 3 times for whatever reason. Now let's say: 
at ts1, for key "DIAGNOSTICS", the value is "xyz". 
at ts1, for key "SOMETHING ELSE", the value is "something"
at ts2, for key "DIAGNOSTICS", the value is "abc" 
at ts3, for key "DIAGNOSTICS", the value is "pqr"
at ts3, for key "SOMETHING ELSE", the value is "something even more"

where ts1 < ts2 < ts3. So ts3 is the most recent timestamp.

Now which of the queries is the most commonly required:
- for this application, what is the diagnostic message for the most recent 
KILLED event id? Or all of the diagnostics in KILLED id?
- for this application, what is the most recent key(s) in the KILLED event id ?
- for this application, what are the keys (& values)  that occurred between ts2 
and ts3 for KILLED event id? 

If we think #2 and #3 are the most commonly run queries, then we can go with 
timestamp before the key.
If we think #1 is the most commonly run query, then we can go with key before 

Now if we choose timestamp before key, then we can never pull back the value 
given an event and a key without fetching all keys in that event for all 

If we choose key before timestamp, we cant easily pull back most recently 
occurred key within an event. 

In any case, we can't know which event was the most recent in the application. 
For example, in this case, INITED event record will be stored before KILLED 
event record since I < K and hbase will sort it lexicographically.

So we are interested in knowing which event itself occurred the most recent, 
then we need to fetch all events (along with event keys and timestamps) and 
sort by timestamp and then return the most recent event.

> Rethink event column key issue
> ------------------------------
>                 Key: YARN-3984
>                 URL: https://issues.apache.org/jira/browse/YARN-3984
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Vrushali C
>             Fix For: YARN-2928
> Currently, the event column key is event_id?info_key?timestamp, which is not 
> so friendly to fetching all the events of an entity and sorting them in a 
> chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
> schema. I open this jira to continue the discussion about it which was 
> commented on YARN-3908.

This message was sent by Atlassian JIRA

Reply via email to