[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329737#comment-14329737
 ] 

Sangjin Lee commented on YARN-3039:
-----------------------------------

Thanks [~djp] for the doc!

Some high level comments:
- I'm also thinking that option 2 might be more feasible, mostly from the 
standpoint of limiting the risk. Having said that, I haven't followed YARN-913 
closely enough to see how close it is...
- The service discovery needs to work across all these different modes: NM aux 
service, standalone per-node daemon, and standalone per-app daemon. That needs 
to be one of the primary considerations in this.
- The failure scenarios need more details in their own right; for this JIRA, I 
think it is sufficient to see how it may impact the service discovery and 
design just enough.

{quote}
We need a perĀ­application logical aggregator for ATS which provides aggregator 
service in
form of REST API to: RM, AM and NMs,
{quote}
The RM will likely not use the service discovery. For example, for RM to write 
the app started event, the timeline aggregator may not even be initialized yet.

{quote}
However, AM container could be reschedule to other
node for some reason (container failure, etc.), so we cannot guarantee the two 
are
always together.
{quote}
If the AM fails and starts in another node, the existing per-app aggregator 
should be shut down, and started on the new node. In fact, in the aux service 
setup, that comes most naturally. So I think we should try to keep that as much 
as possible.

{quote}
Failure Cases: 3. Aggregator failed (only):
{quote}
We're talking about the aggregator failing as a standalone daemon, correct?



> [Aggregator wireup] Implement ATS writer service discovery
> ----------------------------------------------------------
>
>                 Key: YARN-3039
>                 URL: https://issues.apache.org/jira/browse/YARN-3039
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Robert Kanter
>         Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to