[ 
https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809885#comment-13809885
 ] 

Chris Douglas commented on YARN-1324:
-------------------------------------

bq. When does MR use multiple disks in the same task/container? Isnt the map 
output written to a single indexed partition file?

Spills are spread across all volumes, but merged into a single file at the end.

Would randomizing the order of disks be a reasonable short-term workaround for 
(1)? Future changes could weight/elide directories based on other criteria, but 
that's a simple change. So would changing the "random" selection to bias its 
search order using a hash of the task id (instead of disk usage when creating 
the spill), so the ShuffleHandler could search fewer directories on average. I 
agree with Vinod, it would be hard to prevent the search altogether...

bq. Requiring apps to specify the number of disks for a container is also a 
viable solution and can be done in a back-compatible manner by changing MR to 
specify multiple disks and leaving the default to 1 for apps that dont care.

This makes sense as a hint, but some users might interpret it as a constraint 
and be confused when a NM schedules them on a node the reports fewer local dirs 
(due to failure, heterogeneous config).

> NodeManager potentially causes unnecessary operations on all its disks
> ----------------------------------------------------------------------
>
>                 Key: YARN-1324
>                 URL: https://issues.apache.org/jira/browse/YARN-1324
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.2.0
>            Reporter: Bikas Saha
>
> Currently, for every container, the NM creates a directory on every disk and 
> expects the container-task to choose 1 of them and load balance the use of 
> the disks across all containers. 
> 1) This may have worked fine in the MR world where MR tasks would randomly 
> choose dirs but in general we cannot expect every app/task writer to 
> understand these nuances and randomly pick disks. So we could end up 
> overloading the first disk if most people decide to use the first disk.
> 2) This makes a number of NM operations to scan every disk (thus randomizing 
> that disk) to locate the dir which the task has actually chosen to use for 
> its files. Makes all these operations expensive for the NM as well as 
> disruptive for users of disks that did not have the real task working dirs.
> I propose that NM should up-front decide the disk it is assigning to tasks. 
> It could choose to do so randomly or weighted-randomly by looking at space 
> and load on each disk. So it could do a better job of load balancing. Then, 
> it would associate the chosen working directory with the container context so 
> that subsequent operations on the NM can directly seek to the correct 
> location instead of having to seek on every disk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to