[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386190#comment-15386190 ] Vrushali C commented on YARN-3914: -- + 1 on closing this issue. > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Labels: YARN-5355 > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386123#comment-15386123 ] Sangjin Lee commented on YARN-3914: --- I am comfortable with closing this issue. I think your comments are still valid (and so are mine above). I'm +1 with closing it unless there is objection. > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Labels: YARN-5355 > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382970#comment-15382970 ] Li Lu commented on YARN-3914: - I'm checking through the JIRA list of YARN-5355 and saw this. Right now the schema of HBase tables are pretty much finalized, right? Personally I'd prefer the current way since it supports much easier query for (entityID, entityType). Also, determining the "start time" of a timeline entity is error pruning (something like YARN-5340, we spent quite a while to accurately locate the problem). My question here is, given the fact that this issue has been hanging for a year, shall we close it or work out some alternative solutions? > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Labels: YARN-5355 > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630125#comment-14630125 ] Sangjin Lee commented on YARN-3914: --- [~zjshen], we have been discussing this. While adding entity creation time to the row key may solve this problem, the concern is that it may introduce others. If the row key is (user/cluster/flow/run/app_id/entity_type/created_time/entity_id), then even the most basic query for (entity_type + entity_id) will get much more complicated, right? We cannot expect readers to provide the creation time every time they query for an entity by id. Also, as you said, we cannot always accommodate different query vectors by adding more to the row key, or we would be risking blowing up the row key size or breaking other queries. We should be real judicious what goes into the row key... I think it's reasonable to expect that the entity id order would be either completely or nearly identical to the chronological order (e.g. app id, or container id). So perhaps we could rely on the entity id order to help mitigate this problem. Thoughts? > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628681#comment-14628681 ] Li Lu commented on YARN-3914: - Hi [~zjshen], do you think this will affect the data schema design of aggregation storages as well, or it's an "entity table only" change? I think this is independent to the aggregation implementations but would like to double check it. Thanks! > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623170#comment-14623170 ] Zhijie Shen commented on YARN-3914: --- This will not block the implementation of getEntities (YARN-3049), but the performance will be bad without it, especially when the number of entities per type per app becomes huge, i.e., there's a big job. > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)