Zhijie Shen commented on YARN-3391:

Thanks for your input, Joep!

bq. Therefore it seems to me that adding the app_id to the flow_id by default 
does not add any value,

Yeah, I agree it's not adding value by using the app_id, but IMHO, it also 
doesn't add problem. Backing to the aforementioned example, if Sleep_...1 -- 
Sleep_...40 is using application name as the flow name, and Sleep_...41 -- 
Sleep_...50 is set explicitly to be part of flow XYZ, I'll get something weird 
on web UI that "Sleep 40 runs cost 4/5 x". It misleads users that there're 40 
sleep jobs instead of 50.

bq. Then are we thinking on the future RM UI, would we show 1 line for each:

Instead, showing this information sounds more like aggregating applications 
according to application name/type. We can do the aggregation at these 
dimensions, jus as aggregating them based on queue and so on.

> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-3391.1.patch
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?

This message was sent by Atlassian JIRA

Reply via email to