[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200385#comment-14200385
 ] 

Karthik Kambatla edited comment on YARN-2139 at 11/6/14 4:27 PM:
-----------------------------------------------------------------

Thanks for chiming in, Arun.

This JIRA focuses on adding disk scheduling, and isolation for local disk read 
I/O. HDFS short-circuit reads happen to be local-disk reads, and hence we 
handle that too automatically. 

bq. We shouldn't embed Linux or blkio specific semantics such as proportional 
weight division into YARN.
The Linux aspects are only for isolation, and this needs to be pluggable. 

Wei and I are more familiar with FairScheduler, and talk about weighted 
division between queues from that standpoint. We are eager to hear your 
thoughts on how we should do this with CapacityScheduler, and augment the 
configs etc. if need be. I was thinking we would handle it similar to how it 
handles CPU today (more on that later).

bq. We need something generic such as bandwidth which can be understood by 
users, supportable on heterogenous nodes in the same cluster
Our initial thinking was along these lines. However, similar to CPU, it gets 
very hard for a user to specify the bandwidth requirement. It is hard to figure 
out my container *needs* 200 MBps (and 2 GHz CPU). Furthermore, it is hard to 
enforce bandwidth isolation. When multiple processes are accessing a disk, its 
aggregate bandwidth could go down significantly. To *guarantee* bandwidth, I 
believe the scheduler has to be super conservative with its allocations. 

Given all this, we thought we should probably handle it the way we did CPU. 
Each process asks for 'n' vdisks to capture the number of disks it needs. To 
avoid floating point computations, we added an NM config for the available 
vdisks. Heterogeneity in terms of number of disks is easily handled with 
vdisks-per-node knob. Heterogeneity in each disk's capacity or bandwidth is not 
handled, similar to our CPU story. I propose we work on this heterogeneity as 
one of the follow-up items. 

bq. Spindle locality or I/O parallelism is a real concern
Agree. Is it okay if we finish this work and follow-up with spindle-locality? 
We have some thoughts on how to handle it, but left it out of the doc to keep 
the design focused. 


was (Author: kasha):
Thanks for chiming in, Arun.

This JIRA focuses on adding disk scheduling, and isolation for local disk read 
I/O. HDFS short-circuit reads happen to be local-disk reads, and hence we 
handle that too automatically. 

bq. We shouldn't embed Linux or blkio specific semantics such as proportional 
weight division into YARN.
The Linux aspects are only for isolation, and this needs to be pluggable. 

Wei and I are more familiar with FairScheduler, and talk about weighted 
division between queues from that standpoint. We are eager to hear your 
thoughts on how we should do this with CapacityScheduler, and augment the 
configs etc. if need be. I was thinking we would handle it similar to how it 
handles CPU today (more on that later).

bq. We need something generic such as bandwidth which can be understood by 
users, supportable on heterogenous nodes in the same cluster
Our initial thinking was along these lines. However, similar to CPU, it gets 
very hard for a user to specify the bandwidth requirement. It is hard to figure 
out my container *needs* 200 MBps (and 2 GHz CPU). Furthermore, it is hard to 
enforce bandwidth isolation. When multiple processes are accessing a disk, its 
aggregate bandwidth could go down significantly. To *guarantee* bandwidth, I 
believe the scheduler has to be super pessimistic with its allocations. 

Given all this, we thought we should probably handle it the way we did CPU. 
Each process asks for 'n' vdisks to capture the number of disks it needs. To 
avoid floating point computations, we added an NM config for the available 
vdisks. Heterogeneity in terms of number of disks is easily handled with 
vdisks-per-node knob. Heterogeneity in each disk's capacity or bandwidth is not 
handled, similar to our CPU story. I propose we work on this heterogeneity as 
one of the follow-up items. 

bq. Spindle locality or I/O parallelism is a real concern
Agree. Is it okay if we finish this work and follow-up with spindle-locality? 
We have some thoughts on how to handle it, but left it out of the doc to keep 
the design focused. 

> Add support for disk IO isolation/scheduling for containers
> -----------------------------------------------------------
>
>                 Key: YARN-2139
>                 URL: https://issues.apache.org/jira/browse/YARN-2139
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>         Attachments: Disk_IO_Scheduling_Design_1.pdf, 
> Disk_IO_Scheduling_Design_2.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to