[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221556#comment-14221556
 ] 

Wangda Tan commented on YARN-2139:
----------------------------------

Thanks [~ywskycn] for the design doc and prototype.

I have similar feeling like what [~acmurthy] commented, the disk resource is a 
little different from vcore. CPU is a shared resource, processes/threads can 
occupy cpu cores and also can be easily switch to another cores. But disks is 
not, (in spite of RAID), if a process write to a file on local disk (like 
Kafka), you cannot switch the file being writing to another disk easily.

And also, we need consider if there're multiple containers scheduled to a same 
physical disk, it is possible that the total bandwidth of these containers will 
drop very fast.

So I think the scheduling for disks is more like *affinity* to disks (like give 
disk#1,#2,#4 to the container) instead of just limit number of processes on 
each node.

Any thoughts? Please feel free to correct me if I was wrong.

Thanks,
Wangda

> [Umbrella] Support for Disk as a Resource in YARN 
> --------------------------------------------------
>
>                 Key: YARN-2139
>                 URL: https://issues.apache.org/jira/browse/YARN-2139
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wei Yan
>         Attachments: Disk_IO_Scheduling_Design_1.pdf, 
> Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
> YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on 
> nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to