[ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707273#comment-16707273
 ] 

BELUGA BEHR edited comment on YARN-8789 at 12/3/18 2:31 PM:
------------------------------------------------------------

[~pbacsko] The {{offer()}} method is wrapped in a 'while' clause so it will 
continue to attempt to put the event in the queue for as long as it takes.  
They are not lost.

[~wilfreds] Customer is using a version of CDH from before [MAPREDUCE-5124] was 
introduced.  This queue change is also intended to throttle clients and protect 
the AM.  If the queue is full, the producers will wait (their threads will 
block).  If they wait a long time, I imagine that the events coming from a 
remote clients like a Mapper or Reducer will simply timeout and fail.  The 
tasks will have to be re-tried, but it is better, in my mind, to have to 
restart a subset of tasks than to kill the AM with an OOM and never complete.




was (Author: belugabehr):
[~pbacsko] The {{offer()}} method is wrapped in a 'while' clause so it will 
continue to attempt to put the event in the queue for as long as it takes.  
They are not lost.

[~wilfreds] Customer is using a version of CDH from before [MAPREDUCE-5124] was 
introduced.  This queue change is also intended to throttle.  If the queue is 
full, the producers will wait (their threads will block).  If they wait a long 
time, I imagine that the events coming from a remote clients like a Mapper or 
Reducer will simply timeout and fail.  The tasks will have to be re-tried, but 
it is better, in my mind, to have to restart a subset of tasks than to kill the 
AM with an OOM and never complete.



> Add BoundedQueue to AsyncDispatcher
> -----------------------------------
>
>                 Key: YARN-8789
>                 URL: https://issues.apache.org/jira/browse/YARN-8789
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: applications
>    Affects Versions: 3.2.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>         Attachments: YARN-8789.1.patch, YARN-8789.10.patch, 
> YARN-8789.12.patch, YARN-8789.14.patch, YARN-8789.2.patch, YARN-8789.3.patch, 
> YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, 
> YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.
> Logging Message:
> Size of event-queue is xxx



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to