Sangjin Lee commented on YARN-3391:

The easiest case is if the client didn't set anything (e.g. a standalone MR 
app). Then we can apply default values for all (name = YARN app name, version = 
1, run id = app submit time).

One interesting case is if it is a real multi-app flow and it didn't set the 
run id. If a particular flow run had 3 YARN apps, and the flow id was set but 
the run id wasn't set, if we use the YARN app submit time, these 3 YARN apps 
would get different run id's as their app submit times would be different, and 
that's not what we want. The submit time of the flow is not readily apparent 
during the interaction between the flow client and YARN.

So one option in this case is to reject the submission if the flow id is set 
but the flow run id is not set, but there may be better ways of handling cases 
like that.

> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?

This message was sent by Atlassian JIRA

Reply via email to