[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512200#comment-15512200
 ] 

Sangjin Lee edited comment on YARN-5585 at 9/22/16 5:26 AM:
------------------------------------------------------------

I am also catching up on this discussion (sorry it got delayed).

Generally I am in agreement with Varun and Vrushali on possible approaches. I'd 
like to add a few more thoughts to refine the idea.

(1) supporting chronological order sorting
I think that even for framework-specific entities (e.g. tez vertices, MR task 
entities, etc.), the "sorting" order cannot be completely arbitrary. Because we 
have a strong design decision on reflecting recency in the row keys, the 
natural sorting order should be the *chronological order*, or strange things 
would result.

For YARN entities, the id order would satisfy this for the most part (and ditto 
for MR entities). If tez can craft the id's such that the lexicographical order 
is also the chronological order, that would be by far the simplest solution to 
the problem. I'm not sure how feasible it is for tez to add padding etc. to 
preserve the chronological order in the entity id's. [~rohithsharma], can we 
change the id's to order them properly?

If the framework cannot make the id lexicographical order the same as the 
chronological order, then we might have to introduce the notion of bytes 
provided by the framework (and an auxiliary table) to support this as suggested 
by Vrushali and Varun. But that would be at the some cost. All things being 
equal, I would love not to populate another table on the write path.

Also note that we still need to support single-entity queries in this case 
(i.e. queries by entity id). How would we be able to support queries by id in 
this case?

(2) setting the created time field
In timeline service v.2, the strong assumption/requirement is that the created 
time is set by the client. It sounds like the current tez code does not set the 
created time. It should be set. That's the contract we're using. We're not 
really expecting an empty created time when we write them.

(3) TimelineEntity.compareTo()
It is a good catch by Rohith. It escaped the review, but it does appear that 
the id sorting if created time is empty is the opposite of what it should be. 
The string should be sorted by the descending order, but the current code is 
doing the opposite. This should be fixed. We can either fix it here or can open 
a separate subtask to fix it. Either way, we should fix it.


was (Author: sjlee0):
I am also catching up on this discussion (sorry it got delayed).

Generally I am in agreement with Varun and Vrushali on possible approaches. I'd 
like to add a few more thoughts to refine the idea.

(1) supporting chronological order sorting
I think that even for framework-specific entities (e.g. tez vertices, MR task 
entities, etc.), the "sorting" order cannot be completely arbitrary. Because we 
have a strong design decision on reflecting recency in the row keys, the 
natural sorting order should be the *chronological order*, or strange things 
would result.

For YARN entities, the id order would satisfy this for the most part (and ditto 
for MR entities). If tez can craft the id's such that the lexicographical order 
is also the chronological order, that would be by far the simplest solution to 
the problem. I'm not sure how feasible it is for tez to add padding etc. to 
preserve the chronological order in the entity id's. [~rohithsharma], can we 
change the id's to order them properly?

If the framework cannot make the id lexicographical order the same as the 
chronological order, then we might have to introduce the notion of bytes 
provided by the framework (and an auxiliary table) to support this as suggested 
by Vrushali and Varun. But that would be at the some cost. All things being 
equal, I would love not to populate another table on the write path.

Also note that we still need to be able to support single-entity queries in 
this case (i.e. queries by entity id). How would we able to support queries by 
id in this case?

(2) setting the created time field
In timeline service v.2, the strong assumption/requirement is that the created 
time is set by the client. It sounds like the current tez code does not set the 
created time. I think it should be set. That's the contract we're using. We're 
not really expecting an empty created time when we write them.

(3) TimelineEntity.compareTo()
It is a good catch by Rohith. It escaped the review, but it does appear that 
the id sorting if created time is empty is the opposite of what it should be. 
The string should be sorted by the descending order, but the current code is 
doing the opposite. This should be fixed. We can either fix it here or can open 
a separate subtask to fix it. Either way, we should fix it.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to