[
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707273#comment-16707273
]
BELUGA BEHR commented on YARN-8789:
-----------------------------------
[~pbacsko] The {{offer()}} method is wrapped in a 'while' clause so it will
continue to attempt to put the event in the queue for as long as it takes.
They are not lost.
[~wilfreds] Customer is using a version of CDH from before [MAPREDUCE-5124] was
introduced. This queue change is also intended to throttle. If the queue is
full, the producers will wait (their threads will block). If they wait a long
time, I imagine that the events coming from a remote clients like a Mapper or
Reducer will simply timeout and fail. The tasks will have to be re-tried, but
it is better, in my mind, to have to restart a subset of tasks than to kill the
AM with an OOM and never complete.
> Add BoundedQueue to AsyncDispatcher
> -----------------------------------
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: applications
> Affects Versions: 3.2.0
> Reporter: BELUGA BEHR
> Assignee: BELUGA BEHR
> Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.10.patch,
> YARN-8789.12.patch, YARN-8789.14.patch, YARN-8789.2.patch, YARN-8789.3.patch,
> YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch,
> YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing
> with an OOM exception. It had many thousands of Mappers and thousands of
> Reducers. It was noted that in the logging that the event-queue of
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly
> never decreasing.
> I started looking at the code and thought it could use some clean up,
> simplification, and the ability to specify a bounded queue so that any
> incoming events are throttled until they can be processed. This will protect
> the ApplicationMaster from a flood of events.
> Logging Message:
> Size of event-queue is xxx
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]