Sangjin Lee commented on YARN-3908:

[~jrottinghuis], [~vrushalic], and I had offline chats, and we feel that we may 
need to revisit how we store events.

Currently (with this patch) we store the event with the column name 
"e!eventId?infoKey" and the column value being the info value. The event 
timestamp is stored as the cell timestamp. We're realizing that this may not be 
a correct way to store events.

I'm basing this on the 
 we had when we talked about the equality and identity semantics of 
{{TimelineEvent}}. Namely, the id *and* the timestamp form the identity of a 
{{TimelineEvent}}. Then I think storing the timestamp in the HBase cell 
timestamp does not work.

Some questions for you, [~zjshen] and [~gtCarrera9].

(1) *What defines the identity of a {{TimelineEvent}}?*
Is it the event id + timestamp? How about the event type? If you look at the 
{{equals()}} and the {{hashCode()}} implementations of {{TimelineEvent}}, it 
uses the timestamp, the event type, and even the info as a whole, but the id is 
not used for equality. How does that square with the stated intent that the 
event id and the timestamp form the identity?

(2) *What would be the access pattern* for {{TimelineEvents}}?*
Is pretty much the only access pattern "give me all the events that belong to 
this entity"?

Also specifically, would you ever query for an event with the id *and* the 
timestamp? It is not reasonable for readers to be able to provide the event 
timestamp for queries, right?

Would you also query for just the event id? What other access patterns need to 
be supported?

Clarifying those things would help us correctly implement the schema. Thanks!

> Bugs in HBaseTimelineWriterImpl
> -------------------------------
>                 Key: YARN-3908
>                 URL: https://issues.apache.org/jira/browse/YARN-3908
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Vrushali C
>         Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
> YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
> YARN-3908-YARN-2928.005.patch
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.

This message was sent by Atlassian JIRA

Reply via email to