Jason Lowe commented on YARN-2510:

Saw an example of this today where the RM seemed to drop a container completion 
event for a MapReduce AM.  The RM log showed it received the container 
completion event from the NM:

2014-09-02 16:03:30,270 [ResourceManager Event Processor] INFO 
capacity.CapacityScheduler: Application attempt 
appattempt_1407992474095_785859_000001 released container 
container_1407992474095_785859_01_010123 on node: host: xx #containers=x 
available=x used=x with event: FINISHED

However the MapReduce AM never received the container completion event.  It 
always logs a "Received completed container " message for every finished 
container event it receives from the RM, but that log message was missing for 
this particular container.  That lost container completion event (along with 8 
others) caused the MapReduce AM to think it still had map containers left to be 
released and then failed to preempt reducers when the headroom went to zero.

> RM can drop container completion events
> ---------------------------------------
>                 Key: YARN-2510
>                 URL: https://issues.apache.org/jira/browse/YARN-2510
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Priority: Critical
> The RM can drop container completion events and fail to report them to the 
> AM.  Details in the first comment.

This message was sent by Atlassian JIRA

Reply via email to