Zhijie Shen commented on YARN-3391:

I have offline discussion with Vrushali. Here's some summary:

1. We agree that by default, each individual application should belongs to each 
individual flow run. 

2. While Vrushali thought different applications of the same name should belong 
to the same flow (name), I prefer each individual application should belong to 
different flow (name).

My opinion is that each individual application should be completely separated 
at different flow notation levels unless users specify name/version/run 
explicitly to minimize the interaction with other applications. For example, 
the aggregation about this application won't affect others and wont be affected 
by others.

And one technical problem about using application name is that it's "N/A" by 
default, unless users set it explicitly in the framework code. Similarly, the 
other field that we could choose for flow name is application type, which is 
"YARN" by default. Therefore, either using name or type will potentially result 
in most of users' applications in the flow (name) "N/A"/"YARN".

However, the more essential question is if it makes sense to group the 
applications by application name/type by default at the flow (name) level, and 
if the flow-level aggregation info makes sense for this default grouping (e.g. 
all wordcount  jobs of zjshen). [~sjlee0] and [~vinodkv], any comments?

> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-3391.1.patch
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?

This message was sent by Atlassian JIRA

Reply via email to