[jira] [Comment Edited] (YARN-3040) [Data Model] Implement client-side API for handling flows

Zhijie Shen (JIRA) Wed, 18 Mar 2015 15:43:05 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368064#comment-14368064
 ]


Zhijie Shen edited comment on YARN-3040 at 3/18/15 10:42 PM:
-------------------------------------------------------------

I've just uploaded a patch. It's an e2e modification to make the context 
information can be passed from the client to the backend storage. The context 
information includes *clusterId*, *userId*, *flowId*, *flowRunId* and *appId*. 
According to YARN-3240, new TimelineClient is constructed per application, and 
in the context of one application, we can reasonably assume this context 
information should be unchanged. Therefore, they just need to be specified when 
the client is constructed. The context information should be gathered or passed 
to AM and NM to construct timeline client  properly. For example, for AM, this 
information can be passed via env inside CLC. Anyway, it's out of the scope of 
this Jira, we will cover that integration once we make some particular 
framework AM to use new timeline client.

Back to the context information, some of them can be null, and some of them 
doesn't need to be specified explicitly:

*  *clusterId*: The application should specify the a unique cluster ID, or by 
default the cluster ID will be cluster_<start timestamp of RM>.
* *userId*: The user doesn't need to specify this information. Instead, it will 
be obtained by the current ugi of the client.
* *flowId*: The user either pass in a flowID or if it is an orphan application, 
the flowId will be the appId by replace the prefix with "flow".
* *flowRunId*: If it is an orphan application, it's 0. The reason why it should 
be 0 instead of a current timestamp when creating the timeline client is that 
their may have multiple clients in AM and NMs to be constructed at different 
time. They need to be synced on the same flowRunId.
* *appId*: It's the only mandatory context information as we defined before. 
The client is constructed to only work with one application.

I changed the web service endpoint accordingly to make it restful, and change 
the writer interface accordingly to pass in the context information when 
putting the entity. In addition, I've modified the FS-based writer 
implementation to reflect the change. The entity file will be put in the dir 
{{root/entities/<clusterId>/<userId>/<flowId>/<flowRunId>/<appId>/<entityType>/<entityId>.thist}}.
 It has been verified by TestDistributedShell and 
TestFileSystemTimelineWriterImpl.



was (Author: zjshen):
I've just uploaded a patch. It's an e2e modification to make the context 
information can be passed from the client to the backend storage. The context 
information includes *clusterId*, *userId*, *flowId*, *flowRunId* and *appId*. 
According to YARN-3240, new TimelineClient is constructed per application, and 
in the context of one application, we can reasonably assume this context 
information should be unchanged. Therefore, they just need to be specified when 
the client is constructed. The context information should be gathered or passed 
to AM and NM to construct timeline client  properly. For example, for AM, this 
information can be passed via env inside CLC. Anyway, it's out of the scope of 
this Jira, we will cover that integration once we make some particular 
framework AM to use new timeline client.

Back to the context information, some of them can be null, and some of them 
doesn't need to be specified explicitly:

*  *clusterId*: The application should specify the a unique cluster ID, or by 
default the cluster ID will be cluster_<start timestamp of RM>.
* *userId*: The user doesn't need to specify this information. Instead, it will 
be obtained by the current ugi of the client.
* *flowId*: The user either pass in a flowID or if it is an orphan application, 
the flowId will be the appId by replace the prefix with "flow".
* *flowRunId": If it is an orphan application, it's 0. The reason why it should 
be 0 instead of a current timestamp when creating the timeline client is that 
their may have multiple clients in AM and NMs to be constructed at different 
time. They need to be synced on the same flowRunId.
* *appId*: It's the only mandatory context information as we defined before. 
The client is constructed to only work with one application.

I changed the web service endpoint accordingly to make it restful, and change 
the writer interface accordingly to pass in the context information when 
putting the entity. In addition, I've modified the FS-based writer 
implementation to reflect the change. The entity file will be put in the dir 
{{root/entities/<clusterId>/<userId>/<flowId>/<flowRunId>/<appId>/<entityType>/<entityId>.thist}}.
 It has been verified by TestDistributedShell and 
TestFileSystemTimelineWriterImpl.


> [Data Model] Implement client-side API for handling flows
> ---------------------------------------------------------
>
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-3040) [Data Model] Implement client-side API for handling flows

Reply via email to