[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083974#comment-15083974
 ] 

Jason Lowe commented on YARN-1011:
----------------------------------

bq. Tasks are incorrectly over-allocated. Will never use the resources they ask 
for and hence we can safely run additional opportunistic containers. So this 
feature is used to compensate for poorly configured applications. Probably a 
valid scenario but is it common?

In my experience this is fairly common.  Users tend to twiddle with config 
values until something is working then they don't bother to revisit until 
there's a problem.  And it's easier to over allocate than to spend the time to 
carefully tune the task size.  Even if the user is interested in tuning they 
can't always tune optimally.  Some examples are data skew or other 
task-specific issues where a few tasks need a lot of memory but the vast 
majority of the others do not.  Many frameworks only allow the task sizes to be 
configured as a group, so the user has to run all the tasks in the group with 
the worst-case container size even though most of them don't need it.  Pig on 
MapReduce is another example, where it will spawn multiple jobs but the user 
can only configure the memory settings once in the script and they apply to all 
jobs launched by the script.  Therefore the user has to set it to the 
worst-case size across all the script's jobs, and all but one of the jobs runs 
with oversized map containers.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to