[
https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799970#comment-13799970
]
Bikas Saha commented on YARN-1324:
----------------------------------
When does MR use multiple disks in the same task/container? Isnt the map output
written to a single indexed partition file?
Requiring apps to specify the number of disks for a container is also a viable
solution and can be done in a back-compatible manner by changing MR to specify
multiple disks and leaving the default to 1 for apps that dont care.
> NodeManager potentially causes unnecessary operations on all its disks
> ----------------------------------------------------------------------
>
> Key: YARN-1324
> URL: https://issues.apache.org/jira/browse/YARN-1324
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.2.0
> Reporter: Bikas Saha
>
> Currently, for every container, the NM creates a directory on every disk and
> expects the container-task to choose 1 of them and load balance the use of
> the disks across all containers.
> 1) This may have worked fine in the MR world where MR tasks would randomly
> choose dirs but in general we cannot expect every app/task writer to
> understand these nuances and randomly pick disks. So we could end up
> overloading the first disk if most people decide to use the first disk.
> 2) This makes a number of NM operations to scan every disk (thus randomizing
> that disk) to locate the dir which the task has actually chosen to use for
> its files. Makes all these operations expensive for the NM as well as
> disruptive for users of disks that did not have the real task working dirs.
> I propose that NM should up-front decide the disk it is assigning to tasks.
> It could choose to do so randomly or weighted-randomly by looking at space
> and load on each disk. So it could do a better job of load balancing. Then,
> it would associate the chosen working directory with the container context so
> that subsequent operations on the NM can directly seek to the correct
> location instead of having to seek on every disk.
--
This message was sent by Atlassian JIRA
(v6.1#6144)