[
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371899#comment-14371899
]
Advertising
Sangjin Lee commented on YARN-3040:
-----------------------------------
Hi [~zjshen], thanks much for working on this. I just took a quick look at the
patch and the discussion. It seems like you'll update it soon, but I'll pass
along my comments just in case.
One high level comment: the original intent of this JIRA is more of an
end-to-end flow of the flow information (flow name, flow version, and flow run
id). How can individual frameworks (MR, tez, ...) set these attributes and pass
them to the RM at the time of the application launch? How does that information
get passed to the TimelineClient and to the timeline collector? We do need the
API from the beginning portion of the end-to-end picture as well.
bq. new TimelineClient is constructed per application, and in the context of
one application, we can reasonably assume this context information should be
unchanged.
There are a couple of things to consider here (and it sounds like that may be
part of the offline discussion). We need to make sure we handle the case of
NM's writing container-related info. It sounds like each NM will need to have
multiple timeline clients (one for each application).
More importantly, we need to think about the RM use case. The RM will have its
own collector, and it does not go through the TimelineClient API. How would
that work?
More individual comments:
- flowId should be flowName (that's the standard terminology we're using)
- flow version seems to be missing from this; while flow version is not part of
the primary key of the entity, it is a necessary attribute
- I think flow run id can (and should) be a long; it doesn't have to be a
generic string
- in light of this, it might be slightly better to have a (flow) context API
rather than individual arguments where you can set all these flow-related
attributes
- the default cluster id should be just the cluster name; I'm not sure why we
need to add the cluster start timestamp; it would mean that every restart of
the resource manager would create a new logical cluster in the timeline
service; I'm not sure I agree with that
- hopefully isUnitTest can be removed with the changes I made in the previous
commit
> [Data Model] Make putEntities operation be aware of the app's context
> ---------------------------------------------------------------------
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Zhijie Shen
> Attachments: YARN-3040.1.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*.
> Frameworks should be able to define and pass in all attributes of flows and
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)