[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532764#comment-15532764
 ] 

Sangjin Lee commented on YARN-5585:
-----------------------------------

Thanks [~rohithsharma] for your comments and input!

I'd like to structure the proposal in a way that hopefully answers some of your 
questions and moves this forward.

To me one of the key goals here is to keep writes lean. In other words, we 
would like to avoid write amplifications (no more auxiliary tables or double 
writes). Then it follows that the client would need to provide this entity 
prefix not only when the entity is written for the first time but also *on all 
subsequent updates*.

Providing this entity prefix on all writes and updates may not be practical or 
desired for all cases. I can certainly see that this is not practical for 
YARN-generic entities (e.g. containers). So IMO the *optionality* is a must 
here. If you don't want to have a different sort order than the entity id 
order, you shouldn't be forced to do it.

In terms of what the entity prefix should be if you need it, a strong argument 
can be made for using created time for everyone. However, again, providing the 
created timestamp for all subsequent writes may not be practical. That would 
mean that the AM would need to keep track of the created time for all their 
entities at all times. Perhaps that is trivial for certain AMs, and not for 
others. It's all the more reason to come up with a simple prefix scheme that 
can be easily provided in many situations. For example, if there is a number 
that can be easily computed for your entity, that would be a perfect candidate 
for the entity prefix.

For Tez, if we introduce the entity prefix and you use the created time for 
this, either way it would look exactly the same from the tez perspective. 
Whether we have a more flexible entity prefix or explicit created time (both 
would be in the row key), it would work the same. The client code would do 
either
{code}
entity.setEntityPrefix(createdTime);
client.writeEntity(entity); // pseudo-code
{code}
or
{code}
entity.setCreatedTime(createdTime);
client.writeEntity(entity); // pseudo-code
{code}
The rest of the server code or how data is written, fetched and sorted would 
work in the same manner.

Unfortunately I won't be able to attend today's call as I am away on a 
conference. Hopefully this would help the discussion move forward.

> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
>                 Key: YARN-5585
>                 URL: https://issues.apache.org/jira/browse/YARN-5585
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelinereader
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to