Sangjin Lee commented on YARN-3040:

bq. I can see the benefit. For example, if it represents the timestamp, we can 
filter the flow runs and say give me the runs in the last 5 mins. But my 
concern is whether it's the general way to let user to describe a run.

The design doc says the flow runs for a given flow must have "unique and 
totally ordered run identifiers". We obviously had numbers in mind when we had 
that (mostly coming from the ease of sorting and ordering in the storage). And 
that's the convention we will push frameworks to use. I think it is important 
that we make it a number (long). However, there is a difference between having 
numbers as run id's and having timestamps as run id's. I don't think we need to 
go so far as requiring timestamps as run id's. As long as they are numbers, I 
think it would be fine. I can imagine some flows using run id's like "1", "2", 

We could allow any arbitrary scheme to generate the run id's, but the challenge 
is it might seriously hamper the ability to store and sort them efficiently. 
And, in most cases, the timestamp of the flow start is a quite natural scheme, 
and I would think most frameworks will just adopt that scheme. What do you 

On a related note, we should also generate the default run id if it is missing. 
I realize this could be bit tricky. If the flow id is also missing, then we're 
treating this single YARN app as a flow in and of itself. Then we can do 
flow/version/run id = (yarn app name)/("1")/(app submission timestamp). This is 
also mentioned in the design doc.

However, if the flow id is provided but not the flow run id, it can be tricky 
as there can be multiple YARN apps for the given flow run. One obvious solution 
might be to reject app submission if the flow client (not the timeline client) 
sets the flow id but not the flow run id. For that we'd need some kind of a 
common layer for checks. Thoughts?

> [Data Model] Make putEntities operation be aware of the app's context
> ---------------------------------------------------------------------
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch, YARN-3040.2.patch
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.

This message was sent by Atlassian JIRA

Reply via email to