Sangjin Lee commented on YARN-3995:

bq. This is true in most of the cases, unless and untill AM doesn't wait for 
the containers launched/requested by it to go down before it goes down.

Are you thinking of cases where the AM crashes? If the app finishes normally, 
this sequence does not happen, right?

bq. Yes simple linger should be sufficient, shall i make this configurable 
period ? so that there is backup option in case of any issues and if required 
in future we can handle it in a better way ?

Making it configurable sounds fine to me.

bq. Also is launching one thread per collector for closing it is fine ?

I suspect it would be fine. Note that there would be a few collectors per NM at 

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> ----------------------------------------------------------------------------------------------------
>                 Key: YARN-3995
>                 URL: https://issues.apache.org/jira/browse/YARN-3995
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 

This message was sent by Atlassian JIRA

Reply via email to