[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472387#comment-16472387
 ] 

Haibo Chen commented on YARN-8130:
----------------------------------

+1 pending jenkins.

> Race condition when container events are published for KILLED applications
> --------------------------------------------------------------------------
>
>                 Key: YARN-8130
>                 URL: https://issues.apache.org/jira/browse/YARN-8130
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: ATSv2
>            Reporter: Charan Hebri
>            Assignee: Rohith Sharma K S
>            Priority: Major
>         Attachments: YARN-8130.01.patch, YARN-8130.02.patch, 
> YARN-8130.03.patch
>
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_000002']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_000001. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_000002. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_000002* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to