[
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068068#comment-15068068
]
Naganarasimha G R commented on YARN-3995:
-----------------------------------------
Hi [~sjlee0],
As per the discussion we had in the status call, we planned to stop the
collector after 2 seconds of the AM container finished, but already we are
having a code which waits for one second and then closes the collector.
Now IIUC the scope of this jira :
# Introduce a configurable period to wait
# Instead of spawning multiple threads may be we can have single thread which
does this activity ?
Or do we need to introduce some thing else ?
bq When RM finishes the attempt then it can send one finish event through
timelineclient
IMO this also will not gurantee that no event is missed. So i think
configurable wait period is better. Thoughts ?
> Some of the NM events are not getting published due race condition when AM
> container finishes in NM
> ----------------------------------------------------------------------------------------------------
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, timelineserver
> Affects Versions: YARN-2928
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045: While testing in TestDistributedShell found out
> that few of the container metrics events were failing as there will be race
> condition. When the AM container finishes and removes the collector for the
> app, still there is possibility that all the events published for the app by
> the current NM and other NM are still in pipeline,
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)