Naganarasimha G R commented on YARN-3039:

Hi [~djp]
bq. Another idea (from Vinod in offline discussion) is to add a blocking call 
in AMRMClient to get aggregator address directly from RM
+1 for this approach. Also if NM uses this new blocking call in AMRMClient to 
get aggregator address then there might not be any race conditions for posting 
AM container's life cycle events by NM immediately after creation of 
appAggregator through Aux service.

bq. In addition, if adding a new API in AMRMClient can be accepted, NM will use 
TimelineClient too so can handle service discovery automatically.
Are we just adding a method to get the  aggregator address aggregator address ? 
or what other API's are planned ?

bq. NM will notify RM that this new appAggregator is ready for use in next 
heartbeat to RM (missing in this patch).
bq.  RM verify the out of service for this app aggregator first and kick off 
rebind appAggregator to another NM's perNodeAggregatorService in next heartbeat 
I beleive the idea of using AUX service was to to decouple NM and Timeline 
service. If NM will notify RM about new appAggregator creation (based on AUX 
service) then basically NM should be aware of PerNodeAggregatorServer is 
configured as AUX service, and and if it supports rebinding appAggregator for 
failure then it should be able to communicate with  this Auxservice too, 
whether would this be clean approach?

I also feel we need to support  to start per app aggregator only if app 
requests for it (Zhijie also had mentioned abt this). If not we can make use of 
one default aggregator for all these kind of apps launched in NM, which is just 
used to post container entities from different NM's for these apps.

Any discussions happened wrt RM having its own Aggregator ? I feel it would be 
better for RM to have it as it need not depend on any NM's to post any entities

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> -------------------------------------------------------------------
>                 Key: YARN-3039
>                 URL: https://issues.apache.org/jira/browse/YARN-3039
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Junping Du
>         Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, YARN-3039-no-test.patch
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.

This message was sent by Atlassian JIRA

Reply via email to