[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369717#comment-15369717 ] Hudson commented on YARN-3984: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10074 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10074/]) YARN-3984. Adjusted the event column key schema and avoided missing (sjlee: rev 9422d9b50d90a99062880cf648dd86a764bf97ec) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/TimelineWriterUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityRowKey.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestHBaseTimelineWriterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineWriterImpl.java > Rethink event column key issue > -- > > Key: YARN-3984 > URL: https://issues.apache.org/jira/browse/YARN-3984 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Fix For: YARN-2928 > > Attachments: YARN-3984-YARN-2928.001.patch > > > Currently, the event column key is event_id?info_key?timestamp, which is not > so friendly to fetching all the events of an entity and sorting them in a > chronologic order. IMHO, timestamp?event_id?info_key may be a better key > schema. I open this jira to continue the discussion about it which was > commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659183#comment-14659183 ] Zhijie Shen commented on YARN-3984: --- bq. If the info map is not empty, this record would be redundant and will take up storage space. Make sense. The patch looks good to me. Will commit it. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Attachments: YARN-3984-YARN-2928.001.patch Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650105#comment-14650105 ] Hadoop QA commented on YARN-3984: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 52s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 57s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 48s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 38m 46s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748270/YARN-3984-YARN-2928.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / df0ec47 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8744/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8744/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8744/console | This message was automatically generated. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Attachments: YARN-3984-YARN-2928.001.patch Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649598#comment-14649598 ] Zhijie Shen commented on YARN-3984: --- Sure, it's fine too. One question, do we make sure every event has such a column or only the event without info has it? Personally, I prefer the former option, which makes the process of the event uniformed. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649601#comment-14649601 ] Vrushali C commented on YARN-3984: -- bq. One question, do we make sure every event has such a column or only the event without info has it? Yes, I was thinking of doing this for every event id/timestamp. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650014#comment-14650014 ] Li Lu commented on YARN-3984: - Hi [~vrushalic], thanks for the patch. The current patch LGTM (pending Jenkins). Just to double check that we need to firstly grab the event for the case its info map is empty, and then if it fails we start to read all info keys? Thanks! Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Attachments: YARN-3984-YARN-2928.001.patch Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648540#comment-14648540 ] Sangjin Lee commented on YARN-3984: --- To be clear, with the latter option, if we want to look for an event by id, we can use {{ColumnPrefixFilter}} for {{e! eventId}}, right? So in that case we won't need to fetch all columns, correct? Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648521#comment-14648521 ] Li Lu commented on YARN-3984: - Thanks! I think I'm leaning towards to eventid#inverse_event_timestamp?eventKey then, if we have to do the sorting in memory anyways. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648516#comment-14648516 ] Vrushali C commented on YARN-3984: -- If the query has the exact timestamp as well event id, then we can. But for queries like Give me information about CONTAINER KILLED events for this application, we won't be able to return this information without querying for all events in this application. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648491#comment-14648491 ] Vrushali C commented on YARN-3984: -- To reach a conclusion on this: If everyone/most folks are +1 for putting the event timestamp before the event id itself {code} e! inverse_event_timestamp # eventid ? eventkey {code} I can go ahead and create the patch. Note that by doing so, we will *always* have to query for all event ids and all timestamps regardless of the query (unless we know the exact timestamp). If not, the other option is to put the event timestamp after the event id but before the event key.{code} e! eventid # inverse_event_timestamp ? eventkey {code} In this option, we have the option of querying for a particular event id. In both cases, we need to fetch all records, construct TimelineEvent objects and sort them for chronological order. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648499#comment-14648499 ] Li Lu commented on YARN-3984: - Hi [~vrushalic], one quick question. I'm a little bit confused by this: bq. This would mean that we would never be able to query for a specific event. Maybe here you're assuming that the timestamp information is missing for some of our use cases? Or else, because timestamp is one of the two parts of the id of timeline event, I'm not sure why we cannot directly locate that specific column? Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648428#comment-14648428 ] Vrushali C commented on YARN-3984: -- Hi Zhijie, bq. the current query we want to support now (in YARN-3051 and YARN-3049) is to retrieve all events belonging to an entity (e.g. application, attempt, container and etc.). Yes, fetch all events query is supported with all types of row key designs. Fetching all events is not affected by the row key order. The reader would construct a set /list of TimelineEvents in any case and then sort them in the code. The timestamp will help in ordering but you don't know when to stop the scan, so all events belonging to all timestamps have to be fetched and sorting and filtering out latest events has to be done in the code in any case when we fetch all events. bq. In this case, the most efficient way is to put timestamp even before the event ID, so that we don't need to order the events in memory This would mean that we would *never* be able to query for a specific event. We would *always* have to fetch all events belonging to all timestamps and perform client side filtering. I see the point about the info map being empty/null. I will add a case to store event id and timestamp when the info map is null. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648605#comment-14648605 ] Vrushali C commented on YARN-3984: -- Yes, right. Given an event id, in the case the column key is {code} e! eventid # inverse_event_timestamp ? eventkey {code}, we can query for a particular event. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646924#comment-14646924 ] Zhijie Shen commented on YARN-3984: --- [~vrushalic], thanks for picking it up. The aforementioned cases are definitely good to support, while the current query we want to support now (in YARN-3051 and YARN-3049) is to retrieve all events belonging to an entity (e.g. application, attempt, container and etc.). With this basic query, we can easily distill the details that happen to the entity, such as the diagnostic msg of the kill event. In this case, the most efficient way is to put timestamp even before the event ID, so that we don't need to order the events in memory. In addition to the key composition, I find another significant problem with the event store schema. If the event doesn't contain any info, it will be ignored then. And we cannot always guarantee user will put something into info. For example, user may define a KILL event without any diagnostic msg. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646930#comment-14646930 ] Sangjin Lee commented on YARN-3984: --- {quote} In addition to the key composition, I find another significant problem with the event store schema. If the event doesn't contain any info, it will be ignored then. And we cannot always guarantee user will put something into info. For example, user may define a KILL event without any diagnostic msg. {quote} Thanks for spotting that issue [~zjshen]. That's definitely a huge issue. We should address that as part of this JIRA... Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646935#comment-14646935 ] Zhijie Shen commented on YARN-3984: --- In fact, metric has the same problem, but it may be still okay to ignore a metric without any data. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646940#comment-14646940 ] Li Lu commented on YARN-3984: - Yes. This defect is blocking YARN-3049. Linking the two JIRAs together. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646942#comment-14646942 ] Li Lu commented on YARN-3984: - See [~zjshen]'s comments as well as the discussions in YARN-3049 for more details about the current limitation of our HBaseWriter on TimelineEvents. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643758#comment-14643758 ] Vrushali C commented on YARN-3984: -- I can take this up. Please feel free to reassign or if someone else wants it, please let me know on the jira and we can redistribute the jira. To add to my previous comment, let's take an example. Say event id is KILLED and it occurs 3 times for whatever reason. Now let's say: at ts1, for key DIAGNOSTICS, the value is xyz. at ts1, for key SOMETHING ELSE, the value is something at ts2, for key DIAGNOSTICS, the value is abc at ts3, for key DIAGNOSTICS, the value is pqr at ts3, for key SOMETHING ELSE, the value is something even more where ts1 ts2 ts3. So ts3 is the most recent timestamp. Now which of the queries is the most commonly required: - for this application, what is the diagnostic message for the most recent KILLED event id? Or all of the diagnostics in KILLED id? - for this application, what is the most recent key(s) in the KILLED event id ? - for this application, what are the keys ( values) that occurred between ts2 and ts3 for KILLED event id? If we think #2 and #3 are the most commonly run queries, then we can go with timestamp before the key. If we think #1 is the most commonly run query, then we can go with key before timestamp. Now if we choose timestamp before key, then we can never pull back the value given an event and a key without fetching all keys in that event for all timestamps. If we choose key before timestamp, we cant easily pull back most recently occurred key within an event. In any case, we can't know which event was the most recent in the application. For example, in this case, INITED event record will be stored before KILLED event record since I K and hbase will sort it lexicographically. So we are interested in knowing which event itself occurred the most recent, then we need to fetch all events (along with event keys and timestamps) and sort by timestamp and then return the most recent event. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643746#comment-14643746 ] Vrushali C commented on YARN-3984: -- Thanks for filing the jira [~zjshen]. The key design will be based on the query/access pattern that we think is most relevant to this information. How do we envision this event information being accessed? Given an event id, do we see querying for most recent keys within an event as the primary access pattern? Or any time range based queries in fact. If yes, then putting the timestamp *before* the event key will be better. If the primary access pattern will be based on the name of the key in the event id, then putting the timestamp *after* the event key will make it work better. Do you have any example queries/access requests in mind? How was this information queried for in ATSv1? Who might be wanting this information? I think these questions will help us arrive at a solution. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)