[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table

2016-07-20 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386190#comment-15386190
 ] 

Vrushali C commented on YARN-3914:
--

+ 1 on closing this issue. 

> Entity created time should be part of the row key of entity table
> -
>
> Key: YARN-3914
> URL: https://issues.apache.org/jira/browse/YARN-3914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>  Labels: YARN-5355
>
> Entity created time should be part of the row key of entity table, between 
> entity type and entity Id. The reason to have it is to index the entities. 
> Though we cannot index the entities for all kinds of information, indexing 
> them according to the created time is very necessary. Without it, every query 
> for the latest entities that belong to an application and a type will scan 
> through all the entities that belong to them. For example, if we want to list 
> the 100 latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table

2016-07-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386123#comment-15386123
 ] 

Sangjin Lee commented on YARN-3914:
---

I am comfortable with closing this issue. I think your comments are still valid 
(and so are mine above). I'm +1 with closing it unless there is objection.

> Entity created time should be part of the row key of entity table
> -
>
> Key: YARN-3914
> URL: https://issues.apache.org/jira/browse/YARN-3914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>  Labels: YARN-5355
>
> Entity created time should be part of the row key of entity table, between 
> entity type and entity Id. The reason to have it is to index the entities. 
> Though we cannot index the entities for all kinds of information, indexing 
> them according to the created time is very necessary. Without it, every query 
> for the latest entities that belong to an application and a type will scan 
> through all the entities that belong to them. For example, if we want to list 
> the 100 latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table

2016-07-18 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382970#comment-15382970
 ] 

Li Lu commented on YARN-3914:
-

I'm checking through the JIRA list of YARN-5355 and saw this. Right now the 
schema of HBase tables are pretty much finalized, right? Personally I'd prefer 
the current way since it supports much easier query for (entityID, entityType). 
Also, determining the "start time" of a timeline entity is error pruning 
(something like YARN-5340, we spent quite a while to accurately locate the 
problem). 

My question here is, given the fact that this issue has been hanging for a 
year, shall we close it or work out some alternative solutions? 

> Entity created time should be part of the row key of entity table
> -
>
> Key: YARN-3914
> URL: https://issues.apache.org/jira/browse/YARN-3914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>  Labels: YARN-5355
>
> Entity created time should be part of the row key of entity table, between 
> entity type and entity Id. The reason to have it is to index the entities. 
> Though we cannot index the entities for all kinds of information, indexing 
> them according to the created time is very necessary. Without it, every query 
> for the latest entities that belong to an application and a type will scan 
> through all the entities that belong to them. For example, if we want to list 
> the 100 latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table

2015-07-16 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630125#comment-14630125
 ] 

Sangjin Lee commented on YARN-3914:
---

[~zjshen], we have been discussing this. While adding entity creation time to 
the row key may solve this problem, the concern is that it may introduce others.

If the row key is 
(user/cluster/flow/run/app_id/entity_type/created_time/entity_id), then even 
the most basic query for (entity_type + entity_id) will get much more 
complicated, right? We cannot expect readers to provide the creation time every 
time they query for an entity by id.

Also, as you said, we cannot always accommodate different query vectors by 
adding more to the row key, or we would be risking blowing up the row key size 
or breaking other queries. We should be real judicious what goes into the row 
key...

I think it's reasonable to expect that the entity id order would be either 
completely or nearly identical to the chronological order (e.g. app id, or 
container id). So perhaps we could rely on the entity id order to help mitigate 
this problem.

Thoughts?

> Entity created time should be part of the row key of entity table
> -
>
> Key: YARN-3914
> URL: https://issues.apache.org/jira/browse/YARN-3914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Entity created time should be part of the row key of entity table, between 
> entity type and entity Id. The reason to have it is to index the entities. 
> Though we cannot index the entities for all kinds of information, indexing 
> them according to the created time is very necessary. Without it, every query 
> for the latest entities that belong to an application and a type will scan 
> through all the entities that belong to them. For example, if we want to list 
> the 100 latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table

2015-07-15 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628681#comment-14628681
 ] 

Li Lu commented on YARN-3914:
-

Hi [~zjshen], do you think this will affect the data schema design of 
aggregation storages as well, or it's an "entity table only" change? I think 
this is independent to the aggregation implementations but would like to double 
check it. Thanks! 

> Entity created time should be part of the row key of entity table
> -
>
> Key: YARN-3914
> URL: https://issues.apache.org/jira/browse/YARN-3914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Entity created time should be part of the row key of entity table, between 
> entity type and entity Id. The reason to have it is to index the entities. 
> Though we cannot index the entities for all kinds of information, indexing 
> them according to the created time is very necessary. Without it, every query 
> for the latest entities that belong to an application and a type will scan 
> through all the entities that belong to them. For example, if we want to list 
> the 100 latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table

2015-07-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623170#comment-14623170
 ] 

Zhijie Shen commented on YARN-3914:
---

This will not block the implementation of getEntities (YARN-3049), but the 
performance will be bad without it, especially when the number of entities per 
type per app becomes huge, i.e., there's a big job.

> Entity created time should be part of the row key of entity table
> -
>
> Key: YARN-3914
> URL: https://issues.apache.org/jira/browse/YARN-3914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Entity created time should be part of the row key of entity table, between 
> entity type and entity Id. The reason to have it is to index the entities. 
> Though we cannot index the entities for all kinds of information, indexing 
> them according to the created time is very necessary. Without it, every query 
> for the latest entities that belong to an application and a type will scan 
> through all the entities that belong to them. For example, if we want to list 
> the 100 latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)