[ 
https://issues.apache.org/jira/browse/YARN-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343871#comment-16343871
 ] 

Haibo Chen commented on YARN-7835:
----------------------------------

[~rohithsharma] Trying to understand the issue here. It seems like a collector 
is populated upon APP creation whereas it is removed upon APP attempt finish 
event. Ideally, a collector should be bound to either an APP or an APP_ATTEMPT.

Should we make it consistent, that is, either tie a collector with APP 
lifecycle events, or APP_Attempt life cycle events?

> [Atsv2] Race condition in NM while publishing events if second attempt 
> launched on same node
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-7835
>                 URL: https://issues.apache.org/jira/browse/YARN-7835
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-7835.001.patch
>
>
> It is observed race condition that if master container is killed for some 
> reason and launched on same node then NMTimelinePublisher doesn't add 
> timelineClient. But once completed container for 1st attempt has come then 
> NMTimelinePublisher removes the timelineClient. 
>  It causes all subsequent event publishing from different client fails to 
> publish with exception Application is not found. !



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to