Sangjin Lee commented on YARN-3391:

OK, just to clarify, we're talking about a case where one flow (run) is one 
YARN app. The only debate is whether the repeated runs of the (essentially) 
same YARN app should be grouped as different runs of the same flow, or all 
different flows altogether. In other words, *if it ran 100 times, should we 
have 100 flow runs of one flow, or 100 flows each of which has exactly one flow 

To me it seems a no brainer (thanks [~vrushalic] for reminding me) that we do 
want to group the runs of the same YARN app. If a user is running TestDFSIO 
over and over, they should be recognized as different instances of the same 

One mitigating factor is we would modify the mapreduce code to provide the flow 
name/id in case it's not set. Then the default behavior won't kick in for the 
most part. But I think it is important enough to group them and surface them as 
instances of the same flow.

> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-3391.1.patch
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?

This message was sent by Atlassian JIRA

Reply via email to