[
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707273#comment-16707273
]
BELUGA BEHR edited comment on YARN-8789 at 12/3/18 2:31 PM:
------------------------------------------------------------
[~pbacsko] The {{offer()}} method is wrapped in a 'while' clause so it will
continue to attempt to put the event in the queue for as long as it takes.
They are not lost.
[~wilfreds] Customer is using a version of CDH from before [MAPREDUCE-5124] was
introduced. This queue change is also intended to throttle clients and protect
the AM. If the queue is full, the producers will wait (their threads will
block). If they wait a long time, I imagine that the events coming from a
remote clients like a Mapper or Reducer will simply timeout and fail. The
tasks will have to be re-tried, but it is better, in my mind, to have to
restart a subset of tasks than to kill the AM with an OOM and never complete.
was (Author: belugabehr):
[~pbacsko] The {{offer()}} method is wrapped in a 'while' clause so it will
continue to attempt to put the event in the queue for as long as it takes.
They are not lost.
[~wilfreds] Customer is using a version of CDH from before [MAPREDUCE-5124] was
introduced. This queue change is also intended to throttle. If the queue is
full, the producers will wait (their threads will block). If they wait a long
time, I imagine that the events coming from a remote clients like a Mapper or
Reducer will simply timeout and fail. The tasks will have to be re-tried, but
it is better, in my mind, to have to restart a subset of tasks than to kill the
AM with an OOM and never complete.
> Add BoundedQueue to AsyncDispatcher
> -----------------------------------
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: applications
> Affects Versions: 3.2.0
> Reporter: BELUGA BEHR
> Assignee: BELUGA BEHR
> Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.10.patch,
> YARN-8789.12.patch, YARN-8789.14.patch, YARN-8789.2.patch, YARN-8789.3.patch,
> YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch,
> YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing
> with an OOM exception. It had many thousands of Mappers and thousands of
> Reducers. It was noted that in the logging that the event-queue of
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly
> never decreasing.
> I started looking at the code and thought it could use some clean up,
> simplification, and the ability to specify a bounded queue so that any
> incoming events are throttled until they can be processed. This will protect
> the ApplicationMaster from a flood of events.
> Logging Message:
> Size of event-queue is xxx
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]