[
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532592#comment-15532592
]
Varun Saxena commented on YARN-5585:
------------------------------------
bq. In a distributed cluster, we can expect source of origin of same entity
types from different JVM. For example in MR, what if YarnChild's want to
publish its entities with taskId? How can each yarn child knows about
entityPrefixId? Only uniqueness in cluster will be timestamp.
Frankly, by design, application level entities will be published by AM. Only it
has access to the collector address and in a secure setup will have access to
token to publish to collectors. We do not forward this info to containers. AM
can however forward this information to other processes which can then
potentially publish entities but if specific AMs' can do that, they can easily
push the prefix as well. However, task level or its child entities will be
different and will frankly have their own unique prefix.
bq. If entityPrefixId is string
We were thinking of it to be a long. Intention of prefix is to help get a sort
order. Numbers can easily achieve that. Haven't reached a conclusion on this
though. Needs to be further discussed.
bq. If we look at the problem , this issue is from storage layer.
Frankly we cannot necessarily say ordering is a storage issue as no storage
would naturally provide a created time sort ordering. Even insertion order is
not necessary. We had to do some plumbing up even for Level DB and this would
be even more difficult for HDFS storage. Even for timeline service as a whole
(irrespective of storage), technically it should be fine if it provides you a
way to retrieve the entities which you want.
I understand though entity retrieval by created time sort order, is the most
common use case. That is why even I was initially of the opinion that we should
have inherent support for created time ordering. We can go with an index table
for created time as suggested earlier. But this would incur read side penalty.
Or we can have created time as part of entity table row key but this would mean
write side penalty too because you would not know what was the created time of
the entity supplied. We can however force user to send created time in every
entity.
As you were not there in last meeting, your point of view was missing. We can
revisit this again in today's meeting.
The only way this can be solved at timeline service layer without invoking API
change is to have another table to assist in retrieval. But this would then
incur read/write penalties. Can we do something in coprocessor i.e. do
something in prePut or preScan to support created time use case ? Well I am not
really aware of the cost incurred due to this so will have to discuss.
bq. In future, if any other storage is plugged entity prefix would become stale.
Maybe or maybe not. They can potentially use it for indexing as well.
> [Atsv2] Add a new filter fromId in REST endpoints
> -------------------------------------------------
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelinereader
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Priority: Critical
> Attachments: YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the
> applications. Along with those, it would be good to add new filter i.e fromId
> so that entities can be retrieved after the fromId.
> Current Behavior : Default limit is set to 100. If there are 1000 entities
> then REST call gives first/last 100 entities. How to retrieve next set of 100
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is
> no way to achieve this.
> So proposal is to have fromId in the filter like
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to
> app-10.
> Since ATS is targeting large number of entities storage, it is very common
> use case to get next set of entities using fromId rather than querying all
> the entites. This is very useful for pagination in web UI.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]