Junping Du commented on YARN-3334:

Reply to [~sjlee0]'s previous comments:
bq. (ResourceTrackerService.java), Sorry I might be missing something obvious, 
but why was this change necessary?
It is because our previous assumptions that getKeepAliveApplications() in 
NodeStatus of NodeHeartbeatRequest will have all running application list is 
not correct: this list is used to inform RM to keep some applications live when 
they are finishing but log aggregation haven't finished and they need their 
token to be live. So this list is not empty only when enabling security and log 
aggregation and apps on this list are getting finished.  Obviously, this is not 
what we wanted and it will make our end-to-end test (TestDistributedShell so 
far) get failed in multiple node cluster. For the sake that this won't break 
single node testbed because the only NM will get address from aggregator-NM 
protocol rather than from NM-RM protocol, so we didn't find it before. We may 
need a separated field in NodeHeartbeatRequest to inform RM the running Apps on 
the node in future if necessary.

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --------------------------------------------------------------------------------------------------------------
>                 Key: YARN-3334
>                 URL: https://issues.apache.org/jira/browse/YARN-3334
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: YARN-2928
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.

This message was sent by Atlassian JIRA

Reply via email to