[
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221556#comment-14221556
]
Wangda Tan commented on YARN-2139:
----------------------------------
Thanks [~ywskycn] for the design doc and prototype.
I have similar feeling like what [~acmurthy] commented, the disk resource is a
little different from vcore. CPU is a shared resource, processes/threads can
occupy cpu cores and also can be easily switch to another cores. But disks is
not, (in spite of RAID), if a process write to a file on local disk (like
Kafka), you cannot switch the file being writing to another disk easily.
And also, we need consider if there're multiple containers scheduled to a same
physical disk, it is possible that the total bandwidth of these containers will
drop very fast.
So I think the scheduling for disks is more like *affinity* to disks (like give
disk#1,#2,#4 to the container) instead of just limit number of processes on
each node.
Any thoughts? Please feel free to correct me if I was wrong.
Thanks,
Wangda
> [Umbrella] Support for Disk as a Resource in YARN
> --------------------------------------------------
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Wei Yan
> Attachments: Disk_IO_Scheduling_Design_1.pdf,
> Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch,
> YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on
> nodes, (2) isolation at runtime, (3) spindle locality.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)