[
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391554#comment-14391554
]
Joep Rottinghuis commented on YARN-3391:
----------------------------------------
Whether the app_id needs to be part of the default flow name or not seems to
boil down how we think about flows.
Let's say somebody runs the Sleep job, wordcount, TestDFSIO, or an application
that doesn't use MapReduce (where we could default to the app name). For
example if somebody runs a Spark app.
Then are we thinking on the future RM UI, would we show 1 line for each:
{noformat}
Sleep 50 runs cost x
wordcount 12 runs cost y
TestDFSIO 10 runs cost z
{noformat}
Or would we show one line per run:
{noformat}
Sleep_...1 1 runs cost x/50
Sleep_...2 1 runs cost x/50
...
Sleep_...49 1 runs cost x/50
Sleep_...50 1 runs cost x/50
wordcount_...1 1 runs cost y/12
wordcount_...2 1 runs cost y/12
wordcount_...3 1 runs cost y/12
...
wordcount_...11 1 runs cost y/12
wordcount_...11 1 runs cost y/12
TestDFSIO_1 1 runs cost z/10
TestDFSIO_2 1 runs cost z/10
TestDFSIO_3 1 runs cost z/10
...
TestDFSIO_9 1 runs cost z/10
TestDFSIO_10 1 runs cost z/10
{noformat}
It would seem that we already have the UI with individual application ids, so
users can already see each individual yarn app that way. We'd also be able to
drill into the wordcount flow name and see 12 runs, each with their unique yarn
app id.
Therefore it seems to me that adding the app_id to the flow_id by default does
not add any value, but setting the flow_id to the app name does add value. We
don't want to map it to a static value as pointed out earlier (we'd see a huge
number of runs for a single flow called "1" or something similar), but forcing
every flow to be unique seems to overlap with what we already have with runs.
We'd force each flow to be unique with only 1 run.
> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Zhijie Shen
> Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not
> set it)
> - How do we handle flow attributes in case of nested levels of flows?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)