[ 
https://issues.apache.org/jira/browse/YARN-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120544#comment-14120544
 ] 

Jason Lowe commented on YARN-2510:
----------------------------------

Saw an example of this today where the RM seemed to drop a container completion 
event for a MapReduce AM.  The RM log showed it received the container 
completion event from the NM:

{noformat}
2014-09-02 16:03:30,270 [ResourceManager Event Processor] INFO 
capacity.CapacityScheduler: Application attempt 
appattempt_1407992474095_785859_000001 released container 
container_1407992474095_785859_01_010123 on node: host: xx #containers=x 
available=x used=x with event: FINISHED
{noformat}

However the MapReduce AM never received the container completion event.  It 
always logs a "Received completed container " message for every finished 
container event it receives from the RM, but that log message was missing for 
this particular container.  That lost container completion event (along with 8 
others) caused the MapReduce AM to think it still had map containers left to be 
released and then failed to preempt reducers when the headroom went to zero.

> RM can drop container completion events
> ---------------------------------------
>
>                 Key: YARN-2510
>                 URL: https://issues.apache.org/jira/browse/YARN-2510
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Priority: Critical
>
> The RM can drop container completion events and fail to report them to the 
> AM.  Details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to