[
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798315#comment-15798315
]
Rohith Sharma K S commented on YARN-6027:
-----------------------------------------
Thanks folks for the discussion.
I had offline talk with Sunil regarding Yarn UI integration with ATSv2. Some of
discussion points are
# When flows are queried without any filters, there are duplicate flows
entities are sent from reader. Each duplicated flow contains aggregated flow
runs details. CMIIAW, I think this is because when same flow is run daily,
there will be 2(assuming 2 days it has run) entries in FlowActivityTable.
*While reading, if no filters are given then full table scan happens for
FlowActivityTable.* This result in duplicate entries of same flow name. To me,
*current behavior of retrieving flows should be restricted to current day
only*(not even for last 24 hours, which can cause duplicated entries).
# Now lets take for single day flow activities, if number of flows run is huge,
lets say 1000, then the REST API result is only 100 flow names where 100 is
limit. User can query by increasing limit to 1000, but it is not ideal for UI
rendering which would go into toss with many issues like browser OOM. Issues is
UI does not know how many flow are exist, and better solution here is to render
page by page for a single day. *At least pagination should be supported for
single day flow activities.* I know that in current HBase schema of
flowActivity table, pagination would be difficult to achieve but from API layer
there should be filter for it. Otherwise it is very pain full for UI developers
who relay on ATSv2 data.
# Date range and limit filter do not solve UI rendering issues which pagination
solves. It can only minimizes number of flows. Date range is supported, but
with in a day, ranges are not supported like 10 AM to 11 AM range.
# And also I see that flow entities contains all the flow run details. Do we
really need to embed flowruns details in flow entities? Does not it become
heavy? I think, flowrun information in flow entities should treated as filter.
However there is a separate API to get all the flowruns.
> Support fromId for flows/flowrun apps
> -------------------------------------
>
> Key: YARN-6027
> URL: https://issues.apache.org/jira/browse/YARN-6027
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Labels: yarn-5355-merge-blocker
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar
> filter for flows/flowRun apps and flow run and flow as well.
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates
> found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]