[
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532764#comment-15532764
]
Sangjin Lee commented on YARN-5585:
-----------------------------------
Thanks [~rohithsharma] for your comments and input!
I'd like to structure the proposal in a way that hopefully answers some of your
questions and moves this forward.
To me one of the key goals here is to keep writes lean. In other words, we
would like to avoid write amplifications (no more auxiliary tables or double
writes). Then it follows that the client would need to provide this entity
prefix not only when the entity is written for the first time but also *on all
subsequent updates*.
Providing this entity prefix on all writes and updates may not be practical or
desired for all cases. I can certainly see that this is not practical for
YARN-generic entities (e.g. containers). So IMO the *optionality* is a must
here. If you don't want to have a different sort order than the entity id
order, you shouldn't be forced to do it.
In terms of what the entity prefix should be if you need it, a strong argument
can be made for using created time for everyone. However, again, providing the
created timestamp for all subsequent writes may not be practical. That would
mean that the AM would need to keep track of the created time for all their
entities at all times. Perhaps that is trivial for certain AMs, and not for
others. It's all the more reason to come up with a simple prefix scheme that
can be easily provided in many situations. For example, if there is a number
that can be easily computed for your entity, that would be a perfect candidate
for the entity prefix.
For Tez, if we introduce the entity prefix and you use the created time for
this, either way it would look exactly the same from the tez perspective.
Whether we have a more flexible entity prefix or explicit created time (both
would be in the row key), it would work the same. The client code would do
either
{code}
entity.setEntityPrefix(createdTime);
client.writeEntity(entity); // pseudo-code
{code}
or
{code}
entity.setCreatedTime(createdTime);
client.writeEntity(entity); // pseudo-code
{code}
The rest of the server code or how data is written, fetched and sorted would
work in the same manner.
Unfortunately I won't be able to attend today's call as I am away on a
conference. Hopefully this would help the discussion move forward.
> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelinereader
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the
> applications. Along with those, it would be good to add new filter i.e fromId
> so that entities can be retrieved after the fromId.
> Current Behavior : Default limit is set to 100. If there are 1000 entities
> then REST call gives first/last 100 entities. How to retrieve next set of 100
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is
> no way to achieve this.
> So proposal is to have fromId in the filter like
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to
> app-10.
> Since ATS is targeting large number of entities storage, it is very common
> use case to get next set of entities using fromId rather than querying all
> the entites. This is very useful for pagination in web UI.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]