[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344914#comment-14344914
 ] 

Junping Du commented on YARN-3039:
----------------------------------

Thanks for comments, [~Naganarasimha]!
bq. +1 for this approach. Also if NM uses this new blocking call in AMRMClient 
to get aggregator address then there might not be any race conditions for 
posting AM container's life cycle events by NM immediately after creation of 
appAggregator through Aux service.
Discussed with [~vinodkv] and [~zjshen] on this again offline. It looks heavy 
weight to make TimelineClient to wrap AMRMClient especially for security reason 
it make NM to take AMRMTokens for using TimelineClient in future which make 
less sense. To get rid of rack condition you mentioned above, we propose to use 
observer pattern to make TimelineClient can listen aggregator address update in 
AM or NM (wrap with retry logic to tolerant connection failure).

bq. Are we just adding a method to get the aggregator address aggregator 
address ? or what other API's are planned ?
Per above comments, we have no plan to add API to TimelineClient to talk to RM 
directly.

bq. I beleive the idea of using AUX service was to to decouple NM and Timeline 
service. If NM will notify RM about new appAggregator creation (based on AUX 
service) then basically NM should be aware of PerNodeAggregatorServer is 
configured as AUX service, and and if it supports rebinding appAggregator for 
failure then it should be able to communicate with this Auxservice too, whether 
would this be clean approach?
I agree we want to decouple things here. However, AUX service is not the only 
way to deploy app aggregators. There are other ways (check from diagram in 
YARN-3033) that app aggregators could be deployed in a separate process or an 
independent container which make less sense to have a protocol between AUX 
service and RM. I think now we should plan to add a protocol between aggregator 
and NM, and then notify RM through NM-RM heartbeat on registering/rebind for 
aggregator.

bq. I also feel we need to support to start per app aggregator only if app 
requests for it (Zhijie also had mentioned abt this). If not we can make use of 
one default aggregator for all these kind of apps launched in NM, which is just 
used to post container entities from different NM's for these apps.
My 2 cents here is app aggregator should have logic to consolidate all messages 
(events and metrics) for one application into more complex and flexible new 
data model. If each NM do aggregation separately, then it still a *writer* 
(like old timeline service), but not an *aggregator*. Thoughts?

bq. Any discussions happened wrt RM having its own Aggregator ? I feel it would 
be better for RM to have it as it need not depend on any NM's to post any 
entities.
Agree. I think we are on the same page now.
Will update proposal to reflect all these discussions (JIRA's and offline).

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> -------------------------------------------------------------------
>
>                 Key: YARN-3039
>                 URL: https://issues.apache.org/jira/browse/YARN-3039
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Junping Du
>         Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, YARN-3039-no-test.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to