[
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391549#comment-14391549
]
Zhijie Shen commented on YARN-3391:
-----------------------------------
I have offline discussion with Vrushali. Here's some summary:
1. We agree that by default, each individual application should belongs to each
individual flow run.
2. While Vrushali thought different applications of the same name should belong
to the same flow (name), I prefer each individual application should belong to
different flow (name).
My opinion is that each individual application should be completely separated
at different flow notation levels unless users specify name/version/run
explicitly to minimize the interaction with other applications. For example,
the aggregation about this application won't affect others and wont be affected
by others.
And one technical problem about using application name is that it's "N/A" by
default, unless users set it explicitly in the framework code. Similarly, the
other field that we could choose for flow name is application type, which is
"YARN" by default. Therefore, either using name or type will potentially result
in most of users' applications in the flow (name) "N/A"/"YARN".
However, the more essential question is if it makes sense to group the
applications by application name/type by default at the flow (name) level, and
if the flow-level aggregation info makes sense for this default grouping (e.g.
all wordcount jobs of zjshen). [~sjlee0] and [~vinodkv], any comments?
> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Zhijie Shen
> Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not
> set it)
> - How do we handle flow attributes in case of nested levels of flows?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)