[
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083974#comment-15083974
]
Jason Lowe commented on YARN-1011:
----------------------------------
bq. Tasks are incorrectly over-allocated. Will never use the resources they ask
for and hence we can safely run additional opportunistic containers. So this
feature is used to compensate for poorly configured applications. Probably a
valid scenario but is it common?
In my experience this is fairly common. Users tend to twiddle with config
values until something is working then they don't bother to revisit until
there's a problem. And it's easier to over allocate than to spend the time to
carefully tune the task size. Even if the user is interested in tuning they
can't always tune optimally. Some examples are data skew or other
task-specific issues where a few tasks need a lot of memory but the vast
majority of the others do not. Many frameworks only allow the task sizes to be
configured as a group, so the user has to run all the tasks in the group with
the worst-case container size even though most of them don't need it. Pig on
MapReduce is another example, where it will spawn multiple jobs but the user
can only configure the memory settings once in the script and they apply to all
jobs launched by the script. Therefore the user has to set it to the
worst-case size across all the script's jobs, and all but one of the jobs runs
with oversized map containers.
> [Umbrella] Schedule containers based on utilization of currently allocated
> containers
> -------------------------------------------------------------------------------------
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated
> containers and, if appropriate, allocate more (speculative?) containers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)