Charan Hebri created YARN-8130:

             Summary: Race condition when container events are published for 
KILLED applications
                 Key: YARN-8130
             Project: Hadoop YARN
          Issue Type: Bug
          Components: ATSv2
            Reporter: Charan Hebri

There seems to be a race condition happening when an application is KILLED and 
the corresponding container event information is being published. For completed 
containers, a YARN_CONTAINER_FINISHED event is generated but for some 
containers in a KILLED application this information is missing. Below is a node 
manager log snippet,
2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
( - Application 
application_1523259757659_0003 removed, cleanupLocalDirs = false

2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
( - Application application_1523259757659_0003 

2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
( - Seems like client has been removed 
before the entity could be published for TimelineEntity[type='YARN_CONTAINER', 

2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
( - Application just 
finished : application_1523259757659_0003

2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
( - Uploading logs for 
container container_1523259757659_0003_01_000001. Current good log dirs are 

2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
( - Uploading logs for 
container container_1523259757659_0003_01_000002. Current good log dirs are 

2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
( - The collector service for 
application_1523259757659_0003 was removed

2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
( - couldn't find application 
application_1523259757659_0003 while processing FINISH_APPS event. The 
ResourceManager allocated resources for this application to the NodeManager but 
no active containers were found to process{code}
The container id specified in the log, *container_1523259757659_0003_01_000002* 
is the one that has the finished event missing.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to