[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371899#comment-14371899
 ] 

Sangjin Lee commented on YARN-3040:
-----------------------------------

Hi [~zjshen], thanks much for working on this. I just took a quick look at the 
patch and the discussion. It seems like you'll update it soon, but I'll pass 
along my comments just in case.

One high level comment: the original intent of this JIRA is more of an 
end-to-end flow of the flow information (flow name, flow version, and flow run 
id). How can individual frameworks (MR, tez, ...) set these attributes and pass 
them to the RM at the time of the application launch? How does that information 
get passed to the TimelineClient and to the timeline collector? We do need the 
API from the beginning portion of the end-to-end picture as well.

bq. new TimelineClient is constructed per application, and in the context of 
one application, we can reasonably assume this context information should be 
unchanged.

There are a couple of things to consider here (and it sounds like that may be 
part of the offline discussion). We need to make sure we handle the case of 
NM's writing container-related info. It sounds like each NM will need to have 
multiple timeline clients (one for each application).

More importantly, we need to think about the RM use case. The RM will have its 
own collector, and it does not go through the TimelineClient API. How would 
that work?

More individual comments:
- flowId should be flowName (that's the standard terminology we're using)
- flow version seems to be missing from this; while flow version is not part of 
the primary key of the entity, it is a necessary attribute
- I think flow run id can (and should) be a long; it doesn't have to be a 
generic string
- in light of this, it might be slightly better to have a (flow) context API 
rather than individual arguments where you can set all these flow-related 
attributes
- the default cluster id should be just the cluster name; I'm not sure why we 
need to add the cluster start timestamp; it would mean that every restart of 
the resource manager would create a new logical cluster in the timeline 
service; I'm not sure I agree with that
- hopefully isUnitTest can be removed with the changes I made in the previous 
commit


> [Data Model] Make putEntities operation be aware of the app's context
> ---------------------------------------------------------------------
>
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to