[
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233341#comment-14233341
]
Bikas Saha commented on YARN-2139:
----------------------------------
So to be clear, currently vdisks is counting the number of physical drives
present on the box.
Something to keep in mind would be whether this also entails a change in the NM
policy of providing a directly on every local dir (which typically maps to
every disk) to every task. And tasks are free to choose one or more of those
dirs (disks) to write to. This puts the spinning disk head under contention and
affects performance of all writers on that disk because seeks are expensive.
The thumb rule tends to be to allocate as many number of tasks to a machine as
the number of disks (maybe 2x) so as to keep this seek cost low. Should we
consider evaluating a change in this policy that gives a container 1 local dir
to a container with 1 vdisk. This way for a machine with 6 disks (and 6 vdisks)
would have 6 tasks running, each with their own "dedicated" disk. Off hand its
hard to say how this would compare with all 6 disks allocated to all 6 tasks
and letting cgroups enforce sharing. If multiple tasks end up choosing the same
disk for their writes, then they may not end up getting the "allocation" that
they thought they would get.
> [Umbrella] Support for Disk as a Resource in YARN
> --------------------------------------------------
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Wei Yan
> Attachments: Disk_IO_Isolation_Scheduling_3.pdf,
> Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf,
> YARN-2139-prototype-2.patch, YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on
> nodes, (2) isolation at runtime, (3) spindle locality.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)