[
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857267#comment-15857267
]
Rohith Sharma K S commented on YARN-6027:
-----------------------------------------
Thanks [~varun_saxena] for the review..
bq.Do we need cluster ID in fromId because we are ignoring it completely?
Yes, it is required even though it is ignored, considering when fromId is being
used. Do not want user to parse something and provide it as fromId. User can
directly provide flow entity ID as fromId. Lets reader server handles it.
Cluster Id check can be done to verify context cluster and from clusterId are
equal. Ideally both should match. Otherwise we can throw exception.
bq. If there is a / in cluster ID we may have to escape it to avoid parsing
errors.
If need to parse the errors, then why flow entity id is providing full row key
as id? I think need to change flow entity id format itself.
bq. If we use collapse, even with fromId, there seems to be a full table scan
which will impact
Yes, it does table scan. But it is expected to collapse with date range
otherwise default behavior of /flows should be changed to give one day flows
rather than full table data. It is a engineering issue, and may be can mention
like performance will be bit slow.
bq. Maybe we can send the last real ID in info field of last flow activity
entity if previous query was made with collapse field
Initially idea was to send last real id as fromId field info. But flows are
stored per day for each user which not useful. Note that when collapse is used,
we must scan to get all entities and apply fromId. Scanning can't be done half
the way which end up in redundant entries for the user. Given previous comment
is satisfied this should not be an issue.
bq. you have mentioned that fromId validation is happening in getResult method.
Could not find it
ahh, I think I have missed it at global level. I have validating in one
condition. Will validate at global level.
bq. In processResults we first get the result from backend while applying limit
and then process result for collapse and fromId filters.
If you look at the patch, I have removed PageFilter while scanning which gives
all the data. One optimization I can do is PageFilter can be applied in
non-collapse mode because in non collapse mode scanning will start from given
fromId. But the same logic can not be used for collapse mode.
> Improve /flows API for more flexible filters fromid, collapse, userid
> ---------------------------------------------------------------------
>
> Key: YARN-6027
> URL: https://issues.apache.org/jira/browse/YARN-6027
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Labels: yarn-5355-merge-blocker
> Attachments: YARN-6027-YARN-5355.0001.patch
>
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar
> filter for flows/flowRun apps and flow run and flow as well.
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates
> found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]