Sangjin Lee commented on YARN-3045:

The lifecycle management of app collector is a little tricky here: it get 
registered when the first container (AM) get launched, but should not 
unregistered immediately when AM container get stop. May be wait for 
application finish event comes to NM should work for most cases. For corner 
case that NM publisher delay too long time (queue is busy) to publish event, it 
still get chance to fail (very low chance should be acceptable here). Later, we 
will run to similar issue again when we are doing app level aggregation in app 
collector that the aggregation process could still be running. In any case, we 
should pay special attention to lifecycle management for collector - we have a 
separated JIRA to move it out of auxiliary service. I think we can discuss more 
on this together with/in that JIRA.

It's a good point. I think some amount of "linger" after the AM container is 
completed should be a fine solution. Note that not only the collector needs to 
be up but also the mapping should not be removed from the RM for this to work.

As [~djp] pointed out, having multiple app attempts (AMs) is another case. 
Perhaps the same linger can apply in that case so that the collector can stick 
around to handle some writes until the next collector that belongs to the next 
AM comes online and registers itself. We need to hash out the details of 
multiple AMs scenario, preferably in a different JIRA.

> [Event producers] Implement NM writing container lifecycle events to ATS
> ------------------------------------------------------------------------
>                 Key: YARN-3045
>                 URL: https://issues.apache.org/jira/browse/YARN-3045
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3045-YARN-2928.002.patch, 
> YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
> YARN-3045.20150420-1.patch
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.

This message was sent by Atlassian JIRA

Reply via email to