[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276358#comment-14276358
]
Sangjin Lee commented on YARN-2928:
-----------------------------------
Regarding the per-node approach, I do have some questions (and observations) on
the approach in addition to the aspect of losing the isolation/attribution as
already discussed.
(1)
While it may be faster to allocate with the per-node companions, capacity-wise
you would end up spending more capacity with the per-node approach. Since these
per-node companions are always up although they may be idle for large amount of
time. So if capacity is a concern you may lose out. Under what circumstances
would per-node companions be more advantageous in terms of capacity?
(2)
I do have a question about the work-preserving aspect of the per-node ATS
companion. One implication of making this a per-node thing (i.e. long-running)
is that we need to handle the work-preserving restart. What if we need to
restart the ATS companion? Since other YARN daemons (RM and NM) allow for
work-preserving restarts, we cannot have the ATS companion break that. So that
seems to be a requirement?
(3)
We still need to handle the lifecycle management aspects of it. Previously we
said that when RM allocates an AM it would tell the NM so the NM could spawn
the special container. With the per-node approach, the RM would *still* need to
tell the NM so that the NM can talk to the per-node ATS companion to initialize
the data structure for the given app.
These are quick observations. While I do see value in the per-node approach,
it's not totally clear how much work it would save over the per-app approach
given these observations. What do you think?
> Application Timeline Server (ATS) next gen: phase 1
> ---------------------------------------------------
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and
> YARN-321. Although it is a great feature, we have recognized several critical
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those.
> This is phase 1 of this effort.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)