[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274597#comment-14274597
 ] 

Vinod Kumar Vavilapalli commented on YARN-2928:
-----------------------------------------------

Thanks for the design summary, Sangjin.

For public disclosure, a bunch of YARN community members synced offline about 
this design discussion - tx to Joep Rottinghuis, Karthik Kambatla,, Li Lu, 
Mayank Bansal, Maysam Yabandeh, Mohammad Kamrul Islam, Ram Venkatesh, Robert 
Kanter, Sangjin Lee, Vinod Kumar Vavilapalli, Vrushali Channapattan, Zhijie 
Shen in no order.

Overall I'd like to push other efforts like YARN-2141, YARN-1012 to fit into 
the current architecture being proposed in this JIRA. This is so that we don't 
duplicate stats collection between efforts.

One suggestion to the proposal - for the first cut, instead of spawning per AM 
container (Section 4.1) to represent an Application Level Aggregator (call it 
ALA), we can have a per-node agent which serves multiple AMs running on the 
same node. Nothing else changes - NMs sending data still have to discover the 
ALA, only the ALAs can send data to the underlying storage etc. It's just that 
the ALA is not a special container to begin with. The advantages are that we 
can postpone the hard part of scheduling, fault-tolerance of a special ALA 
container till after we wire everything else. Even long term, for small apps in 
a cluster, ALA running inside/side-by-side of NM with rate-limits reduces the 
'heaviness' of the system. This per-node agent is very useful outside of this 
context too. An additional shortcut for now is to also potentially embed the 
ALA inside NM using say Aux Services. Obviously the biggest problem with a 
single ALA per node or embedded ALA per node is resource-management - which we 
can defer for now given it still runs system code and till we have everything 
else figured out.

On the process side, I propose we do work on a branch with a goal to borrow 
whatever code is possible to from current Timeline service.

Regarding timelines (pun intended) I'd like to think that we have a first alpha 
release of this as part of say 2.8.

> Application Timeline Server (ATS) next gen: phase 1
> ---------------------------------------------------
>
>                 Key: YARN-2928
>                 URL: https://issues.apache.org/jira/browse/YARN-2928
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to