[
https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809885#comment-13809885
]
Chris Douglas commented on YARN-1324:
-------------------------------------
bq. When does MR use multiple disks in the same task/container? Isnt the map
output written to a single indexed partition file?
Spills are spread across all volumes, but merged into a single file at the end.
Would randomizing the order of disks be a reasonable short-term workaround for
(1)? Future changes could weight/elide directories based on other criteria, but
that's a simple change. So would changing the "random" selection to bias its
search order using a hash of the task id (instead of disk usage when creating
the spill), so the ShuffleHandler could search fewer directories on average. I
agree with Vinod, it would be hard to prevent the search altogether...
bq. Requiring apps to specify the number of disks for a container is also a
viable solution and can be done in a back-compatible manner by changing MR to
specify multiple disks and leaving the default to 1 for apps that dont care.
This makes sense as a hint, but some users might interpret it as a constraint
and be confused when a NM schedules them on a node the reports fewer local dirs
(due to failure, heterogeneous config).
> NodeManager potentially causes unnecessary operations on all its disks
> ----------------------------------------------------------------------
>
> Key: YARN-1324
> URL: https://issues.apache.org/jira/browse/YARN-1324
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.2.0
> Reporter: Bikas Saha
>
> Currently, for every container, the NM creates a directory on every disk and
> expects the container-task to choose 1 of them and load balance the use of
> the disks across all containers.
> 1) This may have worked fine in the MR world where MR tasks would randomly
> choose dirs but in general we cannot expect every app/task writer to
> understand these nuances and randomly pick disks. So we could end up
> overloading the first disk if most people decide to use the first disk.
> 2) This makes a number of NM operations to scan every disk (thus randomizing
> that disk) to locate the dir which the task has actually chosen to use for
> its files. Makes all these operations expensive for the NM as well as
> disruptive for users of disks that did not have the real task working dirs.
> I propose that NM should up-front decide the disk it is assigning to tasks.
> It could choose to do so randomly or weighted-randomly by looking at space
> and load on each disk. So it could do a better job of load balancing. Then,
> it would associate the chosen working directory with the container context so
> that subsequent operations on the NM can directly seek to the correct
> location instead of having to seek on every disk.
--
This message was sent by Atlassian JIRA
(v6.1#6144)