Sangjin Lee commented on YARN-2928:

Does these mean that containers ( non-AM ) of an application will face a 
penalty if they need to write data to ATS as all data will need to be routed to 
the aggregator on the AM host first?

One could say that it is a "penalty" in the sense that it incurs an extra hop 
before it can be written to the storage. However, if we let the timeline 
aggregators write directly from the nodes on which the containers run, we would 
have a situation where all nodes have many connections (as many as the number 
of apps running on the node) open to the backing storage. For example, for a 
HBase storage with hundreds of region servers and a writing hadoop cluster with 
thousands of nodes, you'd easily have a criss-cross connection case where each 
region server taking traffic/connections from every single hadoop worker node.

With the current design, each application would retain a (mostly) single stable 
connection to a single region server (assuming it is designed so that data for 
an application resides in a single region server), and will lead to a much 
fewer connections overall. Also, if container data is collected at the app 
level, the timeline aggregator can be a little smarter about this and 
aggregate/update values appropriately.

> Application Timeline Server (ATS) next gen: phase 1
> ---------------------------------------------------
>                 Key: YARN-2928
>                 URL: https://issues.apache.org/jira/browse/YARN-2928
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Priority: Critical
>         Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.

This message was sent by Atlassian JIRA

Reply via email to