[ 
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798450#comment-15798450
 ] 

Varun Saxena commented on YARN-6027:
------------------------------------

Yes, flow activity is meant to display activity for a day. So if a flow is 
active on two days, it will be shown in each of those 2 days. It is not a list 
of flows but a list of flow activities. So the behavior is expected. We need to 
drill down from flow activities to flow runs to applications.
We can in UI, show them as flow activities only clubbed by date. Page can be 
designed something like below ? List of flows will be differentiated by user.
{noformat}
 ______________________________
|  Date   |   List of Flows    |
|_________|____________________|
|  20 Jan |  flow1             |
|         |  flow2             |
|_________|____________________|
|         |                    |
|  19 Jan |  flow2             |
|         |  flow3             |
|_________|____________________| 
{noformat}

Frankly, it is unlikely that the number of flows per day would run into 
thousands if somebody is specifying flows in YARN tags. But as we are designing 
a generic system we should not make assumptions. In an ad-hoc query system, 
each query may very well be taken as a separate flow if flows are not 
specified. And that run into several thousands per day.
So some sort of pagination support can be added.

By the way, I am not sure why user comes before flow name in the row key. I 
think the order in row key can be reversed and we can then use fromFlowId as a 
new filter along with date range (with reasonable restrictions) to achieve 
pagination. 
And pagination can be done within a day. I mean in example given above, we can 
have a sub-table under List of flows column (in UI)
[~sjlee0], you remember why the user was kept before flow name in row key? To 
achieve user level offline aggregation?

Regarding flow run details under a specific flow activity, I had in fact raised 
a JIRA to limit the number of flow runs returned under a flow activity. But 
realistically speaking, number of runs may not be that large in a day. Also, it 
may not be feasible to do it at HBase side with current schema design. This can 
be done at reader side though to reduce payload. We anyways send minimal 
information about the flow run as we only keep run id and version in flow 
activity table. 
Also, we need to see how we design our UI. If we want to support drilling down 
to flow runs for a specific date, we would infact need all the flow runs.
  
I am not sure what you meant by using flow run information in flow entities 
should be treated as a filter. Can you elaborate ?

> Support fromId for flows/flowrun apps
> -------------------------------------
>
>                 Key: YARN-6027
>                 URL: https://issues.apache.org/jira/browse/YARN-6027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>              Labels: yarn-5355-merge-blocker
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar 
> filter for flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates 
> found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to