flow version in API and storage

Joep Rottinghuis (JIRA) Wed, 01 Apr 2015 14:57:01 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391554#comment-14391554
 ]


Joep Rottinghuis commented on YARN-3391:
----------------------------------------

Whether the app_id needs to be part of the default flow name or not seems to 
boil down how we think about flows.
Let's say somebody runs the Sleep job, wordcount, TestDFSIO, or an application 
that doesn't use MapReduce (where we could default to the app name). For 
example if somebody runs a Spark app.

Then are we thinking on the future RM UI, would we show 1 line for each:
{noformat}
Sleep 50 runs cost x 
wordcount 12 runs cost y
TestDFSIO 10 runs cost z
{noformat}

Or would we show one line per run:
{noformat}
Sleep_...1 1 runs cost x/50 
Sleep_...2 1 runs cost x/50 
...
Sleep_...49 1 runs cost x/50 
Sleep_...50 1 runs cost x/50 

wordcount_...1 1 runs cost y/12
wordcount_...2 1 runs cost y/12
wordcount_...3 1 runs cost y/12
...
wordcount_...11 1 runs cost y/12
wordcount_...11 1 runs cost y/12
TestDFSIO_1 1 runs cost z/10
TestDFSIO_2 1 runs cost z/10
TestDFSIO_3 1 runs cost z/10
...
TestDFSIO_9 1 runs cost z/10
TestDFSIO_10 1 runs cost z/10
{noformat}

It would seem that we already have the UI with individual application ids, so 
users can already see each individual yarn app that way. We'd also be able to 
drill into the wordcount flow name and see 12 runs, each with their unique yarn 
app id.
Therefore it seems to me that adding the app_id to the flow_id by default does 
not add any value, but setting the flow_id to  the app name does add value. We 
don't want to map it to a static value as pointed out earlier (we'd see a huge 
number of runs for a single flow called "1" or something similar), but forcing 
every flow to be unique seems to overlap with what we already have with runs. 
We'd force each flow to be unique with only 1 run.


> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

Reply via email to