[jira] [Commented] (YARN-6027) Support fromId for flows/flowrun apps

Sangjin Lee (JIRA) Tue, 03 Jan 2017 11:02:42 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795862#comment-15795862
 ]


Sangjin Lee commented on YARN-6027:
-----------------------------------

Catching up on this discussion after the break.

I am +1 on adding pagination support for flow runs and apps.

Regarding flows (flow activity), I think [~varun_saxena] explained well how 
that table is supposed to be used and served. The intent of the flow activity 
is it's strongly organized by *dates*; i.e. it is stored as daily activities. 
Therefore, a natural way to segment and paginate it would be using date range 
filters.

Please note that "flows" and "flow runs" are different. Flows are closer to the 
application that drives runs. Flow runs are *actualized instances* of those 
flows. For example, if you run a MR sleep job on every hour, there would be 
*one* flow that says "Sleep Job". And there would be 24 flow runs for a given 
day that belong to that flow.

The flow activity table surfaces all the flows (not the flow runs) that had 
activity on a given day, and that should be a good landing point for users, not 
unlike the current RM page where it shows the latest active YARN applications. 
In the above sleep job case, it should show only *one entry* that says "Sleep 
Job". You can drill down (i.e. obtain the list of flow runs from that) to get 
at the 24 flow runs.

bq. Landing page should be list of all flows.

I think we had this discussion in the past but I forget the JIRA id. Given the 
current structure of the data, it would not be very feasible. Also, for any 
sufficiently old cluster, you can easily imagine how big (and slow) this result 
can be. Even if we introduced pagination, you'd be looking at flows that start 
with "A" on this landing page. You'd need to move many pages to get to your 
flows. Again, like the RM landing page, IMO *recency* is the key to the 
usefulness of this UI, and that's why we organized the flow activity that way. 
That way most users would find their flows within the first few pages (at most) 
of the data. Hope that helps.

cc [~jrottinghuis] [~vrushalic]

> Support fromId for flows/flowrun apps
> -------------------------------------
>
>                 Key: YARN-6027
>                 URL: https://issues.apache.org/jira/browse/YARN-6027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>              Labels: yarn-5355-merge-blocker
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar 
> filter for flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates 
> found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6027) Support fromId for flows/flowrun apps

Reply via email to