Charan Hebri created YARN-8130:
----------------------------------

             Summary: Race condition when container events are published for 
KILLED applications
                 Key: YARN-8130
                 URL: https://issues.apache.org/jira/browse/YARN-8130
             Project: Hadoop YARN
          Issue Type: Bug
          Components: ATSv2
            Reporter: Charan Hebri


There seems to be a race condition happening when an application is KILLED and 
the corresponding container event information is being published. For completed 
containers, a YARN_CONTAINER_FINISHED event is generated but for some 
containers in a KILLED application this information is missing. Below is a node 
manager log snippet,
{code:java}
2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
(ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
application_1523259757659_0003 removed, cleanupLocalDirs = false

2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(632)) - Application application_1523259757659_0003 
transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED

2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
(NMTimelinePublisher.java:putEntity(298)) - Seems like client has been removed 
before the entity could be published for TimelineEntity[type='YARN_CONTAINER', 
id='container_1523259757659_0003_01_000002']

2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
finished : application_1523259757659_0003

2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs for 
container container_1523259757659_0003_01_000001. Current good log dirs are 
/grid/0/hadoop/yarn/log

2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs for 
container container_1523259757659_0003_01_000002. Current good log dirs are 
/grid/0/hadoop/yarn/log

2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
(TimelineCollectorManager.java:remove(192)) - The collector service for 
application_1523259757659_0003 was removed

2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:handle(1572)) - couldn't find application 
application_1523259757659_0003 while processing FINISH_APPS event. The 
ResourceManager allocated resources for this application to the NodeManager but 
no active containers were found to process{code}
The container id specified in the log, *container_1523259757659_0003_01_000002* 
is the one that has the finished event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to