[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276358#comment-14276358
 ] 

Sangjin Lee commented on YARN-2928:
-----------------------------------

Regarding the per-node approach, I do have some questions (and observations) on 
the approach in addition to the aspect of losing the isolation/attribution as 
already discussed.

(1)
While it may be faster to allocate with the per-node companions, capacity-wise 
you would end up spending more capacity with the per-node approach. Since these 
per-node companions are always up although they may be idle for large amount of 
time. So if capacity is a concern you may lose out. Under what circumstances 
would per-node companions be more advantageous in terms of capacity?

(2)
I do have a question about the work-preserving aspect of the per-node ATS 
companion. One implication of making this a per-node thing (i.e. long-running) 
is that we need to handle the work-preserving restart. What if we need to 
restart the ATS companion? Since other YARN daemons (RM and NM) allow for 
work-preserving restarts, we cannot have the ATS companion break that. So that 
seems to be a requirement?

(3)
We still need to handle the lifecycle management aspects of it. Previously we 
said that when RM allocates an AM it would tell the NM so the NM could spawn 
the special container. With the per-node approach, the RM would *still* need to 
tell the NM so that the NM can talk to the per-node ATS companion to initialize 
the data structure for the given app.

These are quick observations. While I do see value in the per-node approach, 
it's not totally clear how much work it would save over the per-app approach 
given these observations. What do you think?


> Application Timeline Server (ATS) next gen: phase 1
> ---------------------------------------------------
>
>                 Key: YARN-2928
>                 URL: https://issues.apache.org/jira/browse/YARN-2928
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to