Naganarasimha G R commented on YARN-3995:

Hi [~sjlee0],
As per the discussion we had in the status call, we planned to stop the 
collector after 2 seconds of the AM container finished, but already we are 
having a code which waits for one second and then closes the collector.
Now IIUC the scope of this jira :
# Introduce a configurable period to wait 
# Instead of spawning multiple threads may be we can have single thread which 
does this activity ?

Or do we need to introduce some thing else ?
bq When RM finishes the attempt then it can send one finish event through 
IMO this also will not gurantee that no event is missed. So i think 
configurable wait period is better. Thoughts ?

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> ----------------------------------------------------------------------------------------------------
>                 Key: YARN-3995
>                 URL: https://issues.apache.org/jira/browse/YARN-3995
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 

This message was sent by Atlassian JIRA

Reply via email to