[
https://issues.apache.org/jira/browse/YARN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570782#comment-13570782
]
Alejandro Abdelnur commented on YARN-373:
-----------------------------------------
Hitesh,
Didn't dive into the whole approach yet, first wanted to 'socialize' the idea.
Now let me answer with my current thoughts.
For this use case I was not thinking about resizing 'inflight' containers,
while we could resize easily on CPU, for memory would be quite difficult.
The use case is about shortcutting getting resources for a container by reusing
the same (or less) resources being freed up by a terminating container in the
same node. By doing this you don't have to go to all the way to the scheduler
and compete/wait for those resources to become avail. In short, recycling
resources the AM already got.
The terminating container would still exit, not changing the notion of
completion of a container. The container using the recycled resources would be
a fresh new container process. (Otherwise we could not shrink in memory).
Regarding localized resources, a new resource localization would be done.
> Allow an AM to reuse the resources allocated to container for a new container
> -----------------------------------------------------------------------------
>
> Key: YARN-373
> URL: https://issues.apache.org/jira/browse/YARN-373
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 2.0.3-alpha
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
>
> When a container completes, instead the corresponding resources being freed
> up, it should be possible for the AM to reuse the assigned resources for a
> new container.
> As part of the reallocation, the AM would notify the RM about partial
> resources being freed up and the RM would make the necessary corrections in
> the corresponding node.
> With this functionality, an AM can ensure it gets a container in the same
> node where previous containers run.
> This will allow getting rid of the ShuffleHandler as a service in the NMs and
> run it as regular container task of the corresponding AM. In this case, the
> reallocation would reduce the CPU/MEM obtained for the original container to
> the what is needed for serving the shuffle. Note that in this example the MR
> AM would only do this reallocation for one of the many tasks that may have
> run in a particular node (as a single shuffle task could serve all the map
> outputs from all map tasks run in that node).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira