Junping Du commented on YARN-3045:

Thanks [~Naganarasimha] for updating the patch! Looking into it now, some 
comments will be after.
Some quickly thoughts on your question above.
bq. I prefer to have all the container related events and entities to be 
published by NMTimelinePublisher, so wanted push container usage metrics also 
to NMTimelinePublisher. This will ensure all NM timeline stuff are put in one 
place and remove thread pool handling in ContainerMonitorImpl.
I am generally fine for consolidating the publishment of events and metrics 
with NMTimelinePublisher. However, we may check if need separated event queue 
later to make sure container metrics boom up won't affect events get published.

bq. When the AM container finishes and removes the collector for the app, still 
there is possibility that all the events published for the app by the current 
NM and other NM are still in pipeline, so was wondering whether we can have 
timer task which periodically cleans up collector after some period and not imm 
remove it when AM container is finished.
The lifecycle management of app collector is a little tricky here: it get 
registered when the first container (AM) get launched, but should not 
unregistered immediately when AM container get stop. May be wait for 
application finish event comes to NM should work for most cases. For corner 
case that NM publisher delay too long time (queue is busy) to publish event, it 
still get chance to fail (very low chance should be acceptable here). Later, we 
will run to similar issue again when we are doing app level aggregation in app 
collector that the aggregation process could still be running. In any case, we 
should pay special attention to lifecycle management for collector - we have a 
separated JIRA to move it out of auxiliary service. I think we can discuss more 
on this together with/in that JIRA.

> [Event producers] Implement NM writing container lifecycle events to ATS
> ------------------------------------------------------------------------
>                 Key: YARN-3045
>                 URL: https://issues.apache.org/jira/browse/YARN-3045
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3045-YARN-2928.002.patch, 
> YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
> YARN-3045.20150420-1.patch
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.

This message was sent by Atlassian JIRA

Reply via email to