flow version in API and storage

Vrushali C (JIRA) Mon, 30 Mar 2015 14:53:53 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387491#comment-14387491
 ]


Vrushali C commented on YARN-3391:
----------------------------------

bq. I propose:
flow name: String: default(cluster_<appId without "app" prefix>)

AppId is some string like application_<epoch_timestamp>_<some_number> . I don't 
think using just the numerical part without the "app_" prefix will be easy to 
relate to. Actually, what would be easy for the user to relate to is something 
like (in case of hadoop jobs) mapreduce.job.name param from the config. 
[~zjshen] do you know of any such config or context parameter that can be set 
so that we can pick up the flow name from there for all yarn applications?

bq. flow version: String: default("1")
default string of "1" is fine. 

bq. flow run: long: default(1)

Using a run id of 1 will mean everything will fall into this bucket if no one 
sets the run id. There needs to be a way to ensure the run id is set or if not, 
the default needs to be something variable like submit_time. Else we would have 
a poblem with having a default run id of 1. For example, if I run a sleep job 
10 times and I don't set the run id, then information of each run is 
overwritten (since all of them will have run id of 1).





> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

Reply via email to