Zhijie Shen commented on YARN-3039:

bq. that work can be covered in making the per-node aggregator/collector a 
standalone daemon.

I'm okay with the plan.

Some comments about the patch:

1. Aggregator<->NM is the server-side protocol. The related classes and .proto 
should be put in yarn-server-common, like ResourceTracker.

2. I'm not sure if we need to change ApplicationMaster. Even currently, the 
client will retry on connection problem with the server.

3. IMHO, for the client, we can set TimelineClient as the listener of 
AMRMClient by adding 
AMRMClient#registerTimelineAggregatorAddressListener(TimelineClient client). 
Therefore, the user just needs to make one additional call 
"registerTimelineAggregatorAddressListener" to combine AMRMClient and 
TimelineClient. Inside TimelineClientImpl, there's no need to wait in loop for 
the address update, and to override onAggregatorAddressUpdated. AMRMClientImpl 
call it when get the update from heartbeat response.

4. In TimelineClientImpl, it seems to be not necessary to add additional retry 
logic, as the client has the retry logic as ClientFilter yet.

BTW, the last patch should no longer apply in TimelineClientImpl. It needs to 
be rebased.

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> -------------------------------------------------------------------
>                 Key: YARN-3039
>                 URL: https://issues.apache.org/jira/browse/YARN-3039
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Junping Du
>         Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
> YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
> YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.

This message was sent by Atlassian JIRA

Reply via email to