[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

Naganarasimha G R (JIRA) Thu, 07 Jan 2016 16:42:07 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088490#comment-15088490
 ]


Naganarasimha G R commented on YARN-3995:
-----------------------------------------

bq. Instead of spawning multiple threads may be we can have single thread which 
does this activity ?
Yes i wanted to address it as i was trying to point out earlier ??Instead of 
spawning multiple threads may be we can have single thread which does this 
activity??

bq. How about creating a long-lived single ScheduledExecutorService and 
schedule removeApplication() with the specified delay?
IIUC the approach you mentioned in the callable we will be sleeping for the 
configured period for a application and then remove it. but if multiple apps at 
the same time finish then initial apps only wait for configured period but 
subsequent apps wait for lil more time than the earlier ones.(app's wait period 
+ other apps wait period in the queue ) thoughts?
Some approaches i can adopt to avoid the above issue are :
* Have the timestamp when *close AM container* was called in the callable, and 
in the callable we can have code to wait only if the elapsed time < configured 
linger time.
* Have a map<appid, timestamp> and a single thread(either executor service/ 
timer task) with lower interval like 500ms and it can check this map and remove 
all the apps whose elapsed time is > configured linger time.
thoughts ?

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> ----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3995
>                 URL: https://issues.apache.org/jira/browse/YARN-3995
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-3995-feature-YARN-2928.v1.001.patch
>
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

Reply via email to