Bikas Saha commented on YARN-2139:

So to be clear, currently vdisks is counting the number of physical drives 
present on the box.

Something to keep in mind would be whether this also entails a change in the NM 
policy of providing a directly on every local dir (which typically maps to 
every disk) to every task. And tasks are free to choose one or more of those 
dirs (disks) to write to. This puts the spinning disk head under contention and 
affects performance of all writers on that disk because seeks are expensive. 
The thumb rule tends to be to allocate as many number of tasks to a machine as 
the number of disks (maybe 2x) so as to keep this seek cost low. Should we 
consider evaluating a change in this policy that gives a container 1 local dir 
to a container with 1 vdisk. This way for a machine with 6 disks (and 6 vdisks) 
would have 6 tasks running, each with their own "dedicated" disk. Off hand its 
hard to say how this would compare with all 6 disks allocated to all 6 tasks 
and letting cgroups enforce sharing. If multiple tasks end up choosing the same 
disk for their writes, then they may not end up getting the "allocation" that 
they thought they would get.

> [Umbrella] Support for Disk as a Resource in YARN 
> --------------------------------------------------
>                 Key: YARN-2139
>                 URL: https://issues.apache.org/jira/browse/YARN-2139
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wei Yan
>         Attachments: Disk_IO_Isolation_Scheduling_3.pdf, 
> Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, 
> YARN-2139-prototype-2.patch, YARN-2139-prototype.patch
> YARN should consider disk as another resource for (1) scheduling tasks on 
> nodes, (2) isolation at runtime, (3) spindle locality. 

This message was sent by Atlassian JIRA

Reply via email to