Karthik Kambatla commented on YARN-2139:

bq. this is really disk io bandwidth, so it should use an option, like 
vlocaldiskIObandwidth. This will avoid confusion (make clear its not HDFS), and 
add scope for the addition of future options: IOPs and actual allocation of 
entire disks to containers

Good point. The document should probably discuss this in more detail. I think 
we should separate out the resource model used for requests and scheduling from 
the way we enforce it. 

For the former, I believe vdisks is a good candidate. Users find it hard to 
specify disk IO requirements in terms of IOPS and bandwidth; e.g. my MR task 
*needs* 200 MBps. vdisks, on the other hand, represent a share of the node and 
the IO parallelism (in a somewhat vague sense) the task can make use of. 
Furthermore, it is hard to guarantee a particular bandwidth or performance as 
they depend on the amount of parallelism and degree of randomness the disk 
accesses have. 

That said, I see value in making the enforcement pluggable. This JIRA could add 
the cgroups-based disk-share enforcment. In the future, we could explore other 

> Add support for disk IO isolation/scheduling for containers
> -----------------------------------------------------------
>                 Key: YARN-2139
>                 URL: https://issues.apache.org/jira/browse/YARN-2139
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>         Attachments: Disk_IO_Scheduling_Design_1.pdf

This message was sent by Atlassian JIRA

Reply via email to