[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233341#comment-14233341 ]
Bikas Saha commented on YARN-2139: ---------------------------------- So to be clear, currently vdisks is counting the number of physical drives present on the box. Something to keep in mind would be whether this also entails a change in the NM policy of providing a directly on every local dir (which typically maps to every disk) to every task. And tasks are free to choose one or more of those dirs (disks) to write to. This puts the spinning disk head under contention and affects performance of all writers on that disk because seeks are expensive. The thumb rule tends to be to allocate as many number of tasks to a machine as the number of disks (maybe 2x) so as to keep this seek cost low. Should we consider evaluating a change in this policy that gives a container 1 local dir to a container with 1 vdisk. This way for a machine with 6 disks (and 6 vdisks) would have 6 tasks running, each with their own "dedicated" disk. Off hand its hard to say how this would compare with all 6 disks allocated to all 6 tasks and letting cgroups enforce sharing. If multiple tasks end up choosing the same disk for their writes, then they may not end up getting the "allocation" that they thought they would get. > [Umbrella] Support for Disk as a Resource in YARN > -------------------------------------------------- > > Key: YARN-2139 > URL: https://issues.apache.org/jira/browse/YARN-2139 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Wei Yan > Attachments: Disk_IO_Isolation_Scheduling_3.pdf, > Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, > YARN-2139-prototype-2.patch, YARN-2139-prototype.patch > > > YARN should consider disk as another resource for (1) scheduling tasks on > nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)