[
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088490#comment-15088490
]
Naganarasimha G R commented on YARN-3995:
-----------------------------------------
bq. Instead of spawning multiple threads may be we can have single thread which
does this activity ?
Yes i wanted to address it as i was trying to point out earlier ??Instead of
spawning multiple threads may be we can have single thread which does this
activity??
bq. How about creating a long-lived single ScheduledExecutorService and
schedule removeApplication() with the specified delay?
IIUC the approach you mentioned in the callable we will be sleeping for the
configured period for a application and then remove it. but if multiple apps at
the same time finish then initial apps only wait for configured period but
subsequent apps wait for lil more time than the earlier ones.(app's wait period
+ other apps wait period in the queue ) thoughts?
Some approaches i can adopt to avoid the above issue are :
* Have the timestamp when *close AM container* was called in the callable, and
in the callable we can have code to wait only if the elapsed time < configured
linger time.
* Have a map<appid, timestamp> and a single thread(either executor service/
timer task) with lower interval like 500ms and it can check this map and remove
all the apps whose elapsed time is > configured linger time.
thoughts ?
> Some of the NM events are not getting published due race condition when AM
> container finishes in NM
> ----------------------------------------------------------------------------------------------------
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, timelineserver
> Affects Versions: YARN-2928
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Labels: yarn-2928-1st-milestone
> Attachments: YARN-3995-feature-YARN-2928.v1.001.patch
>
>
> As discussed in YARN-3045: While testing in TestDistributedShell found out
> that few of the container metrics events were failing as there will be race
> condition. When the AM container finishes and removes the collector for the
> app, still there is possibility that all the events published for the app by
> the current NM and other NM are still in pipeline,
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)