[
https://issues.apache.org/jira/browse/YARN-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343871#comment-16343871
]
Haibo Chen commented on YARN-7835:
----------------------------------
[~rohithsharma] Trying to understand the issue here. It seems like a collector
is populated upon APP creation whereas it is removed upon APP attempt finish
event. Ideally, a collector should be bound to either an APP or an APP_ATTEMPT.
Should we make it consistent, that is, either tie a collector with APP
lifecycle events, or APP_Attempt life cycle events?
> [Atsv2] Race condition in NM while publishing events if second attempt
> launched on same node
> --------------------------------------------------------------------------------------------
>
> Key: YARN-7835
> URL: https://issues.apache.org/jira/browse/YARN-7835
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Priority: Critical
> Attachments: YARN-7835.001.patch
>
>
> It is observed race condition that if master container is killed for some
> reason and launched on same node then NMTimelinePublisher doesn't add
> timelineClient. But once completed container for 1st attempt has come then
> NMTimelinePublisher removes the timelineClient.
> It causes all subsequent event publishing from different client fails to
> publish with exception Application is not found. !
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]