[jira] [Commented] (YARN-6027) Support fromId for flows/flowrun apps

Rohith Sharma K S (JIRA) Wed, 04 Jan 2017 05:56:17 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798315#comment-15798315
 ]


Rohith Sharma K S commented on YARN-6027:
-----------------------------------------

Thanks folks for the discussion. 

I had offline talk with Sunil regarding Yarn UI integration with ATSv2. Some of 
discussion points are
# When flows are queried without any filters, there are duplicate flows 
entities are sent from reader. Each duplicated flow contains aggregated flow 
runs details. CMIIAW, I think this is because when same flow is run daily,  
there will be 2(assuming 2 days it has run) entries in FlowActivityTable. 
*While reading, if no filters are given then full table scan happens for 
FlowActivityTable.* This result in duplicate entries of same flow name. To me, 
*current behavior of retrieving flows should be restricted to current day 
only*(not even for last 24 hours, which can cause duplicated entries). 
# Now lets take for single day flow activities, if number of flows run is huge, 
lets say 1000, then the REST API result is only 100 flow names where 100 is 
limit.  User can query by increasing limit to 1000, but it is not ideal for UI 
rendering which would go into toss with many issues like browser OOM. Issues is 
UI does not know how many flow are exist, and better solution here is to render 
page by page for a single day. *At least pagination should be supported for 
single day flow activities.* I know that in current HBase schema of 
flowActivity table, pagination would be difficult to achieve but from API layer 
there should be filter for it. Otherwise it is very pain full for UI developers 
who relay on ATSv2 data. 
# Date range and limit filter do not solve UI rendering issues which pagination 
solves. It can only minimizes number of flows. Date range is supported, but 
with in a day, ranges are not supported like 10 AM to 11 AM range. 
# And also I see that flow entities contains all the flow run details. Do we 
really need to embed flowruns details in flow entities? Does not it become 
heavy? I think, flowrun information in flow entities should treated as filter. 
However  there is a separate API to get all the flowruns.

> Support fromId for flows/flowrun apps
> -------------------------------------
>
>                 Key: YARN-6027
>                 URL: https://issues.apache.org/jira/browse/YARN-6027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>              Labels: yarn-5355-merge-blocker
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar 
> filter for flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates 
> found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-6027) Support fromId for flows/flowrun apps

Reply via email to