[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070660#comment-15070660
 ] 

Naganarasimha G R commented on YARN-3995:
-----------------------------------------

Oops, Sorry my mistake ,
Thanks [~sjlee0] for correcting me. [~sjlee0] current code is already waiting 
for a second in a separate thread after AM container is closed (in 
PerNodeTimelineCollectorsAuxService.stopContainer), but the issue with that 
approach is: it just closes after 1 second though the events are still coming, 
but what i am trying to suggest is close/remove the collector only after a 
period of inactivity in the collector. Will that be good considering it will be 
usually getting delayed for metrics ?
if above approach is not required then already existing approach waits for a 
second in separate thread, does it req any change ? (least i can think is few  
threads will be there if more AM's are run from a single NM )

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> ----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3995
>                 URL: https://issues.apache.org/jira/browse/YARN-3995
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to