[Devel] Performance numbers with IO throttling patches (Was: Re: IO scheduler based IO controller V10)
On Thu, Sep 24, 2009 at 02:33:15PM -0700, Andrew Morton wrote: [..] Environment == A 7200 RPM SATA drive with queue depth of 31. Ext3 filesystem. That's a bit of a toy. Do we have testing results for more enterprisey hardware? Big storage arrays? SSD? Infiniband? iscsi? nfs? (lol, gotcha) Hi All, Couple of days back I posted some performance number of IO scheduler controller and dm-ioband here. http://lkml.org/lkml/2009/10/8/9 Now I have run similar tests with Andrea Righi's IO throttling approach of max bandwidth control. This is the exercise to understand pros/cons of each approach and see how can we take things forward. Environment === Software - 2.6.31 kenrel - IO scheduler controller V10 on top of 2.6.31 - IO throttling patch on top of 2.6.31. Patch is available here. http://www.develer.com/~arighi/linux/patches/io-throttle/old/cgroup-io-throttle-2.6.31.patch Hardware A storage array of 5 striped disks of 500GB each. Used fio jobs for 30 seconds in various configurations. Most of the IO is direct IO to eliminate the effects of caches. I have run three sets for each test. Blindly reporting results of set2 from each test, otherwise it is too much of data to report. Had lun of 2500GB capacity. Used 200G partition with ext3 file system for my testing. For IO scheduler controller testing, created two cgroups of weight 100 each so that effectively disk can be divided half/half between two groups. For IO throttling patches also created two cgroups. Now tricky part is that it is a max bw controller and not a proportional weight controller. So dividing the disk capacity half/half between two cgroups is tricky. The reason being I just don't know what's the BW capacity of underlying storage. Throughput varies so much with type of workload. For example, on my arrary, this is how throughput looks like with different workloads. 8 sequential buffered readers 115 MB/s 8 direct sequential readers bs=64K 64 MB/s 8 direct sequential readers bs=4K 14 MB/s 8 buffered random readers bs=64K3 MB/s 8 direct random readers bs=64K 15 MB/s 8 direct random readers bs=4K 1.5 MB/s So throughput seems to be varying from 1.5 MB/s to 115 MB/s depending on workload. What should be the BW limits per cgroup to divide disk BW in half/half between two groups? So I took a conservative estimate and divide max bandwidth divide by 2, and thought of array capacity as 60MB/s and assign each cgroup 30MB/s. In some cases I have assigened even 10MB/s or 5MB/s to each cgropu to see the effects of throttling. I am using Leaky bucket policy for all the tests. As theme of two controllers is different, at some places it might sound like apples vs oranges comparison. But still it does help... Multiple Random Reader vs Sequential Reader === Generally random readers bring the throughput down of others in the system. Ran a test to see the impact of increasing number of random readers on single sequential reader in different groups. Vanilla CFQ --- [Multiple Random Reader] [Sequential Reader] nr Max-bandw Min-bandw Agg-bandw Max-latency nr Agg-bandw Max-latency 1 23KB/s23KB/s22KB/s691 msec1 13519KB/s 468K usec 2 152KB/s 152KB/s 297KB/s 244K usec 1 12380KB/s 31675 usec 4 174KB/s 156KB/s 638KB/s 249K usec 1 10860KB/s 36715 usec 8 49KB/s11KB/s310KB/s 1856 msec 1 1292KB/s 990K usec 16 63KB/s48KB/s877KB/s 762K usec 1 3905KB/s 506K usec 32 35KB/s27KB/s951KB/s 2655 msec 1 1109KB/s 1910K usec IO scheduler controller + CFQ --- [Multiple Random Reader] [Sequential Reader] nr Max-bandw Min-bandw Agg-bandw Max-latency nr Agg-bandw Max-latency 1 228KB/s 228KB/s 223KB/s 132K usec 1 5551KB/s 129K usec 2 97KB/s97KB/s190KB/s 154K usec 1 5718KB/s 122K usec 4 115KB/s 110KB/s 445KB/s 208K usec 1 5909KB/s 116K usec 8 23KB/s12KB/s158KB/s 2820 msec 1 5445KB/s 168K usec 16 11KB/s3KB/s 145KB/s 5963 msec 1 5418KB/s 164K usec 32 6KB/s 2KB/s 139KB/s 12762 msec 1 5398KB/s 175K usec Notes: - Sequential reader in group2 seems to be well isolated from random readers in group1. Throughput and latency of sequential reader are stable and don't drop as number of random readers inrease in system. io-throttle + CFQ -- BW limit group1=10 MB/s BW limit group2=10 MB/s [Multiple Random Reader] [Sequential Reader] nr Max-bandw Min-bandw Agg-bandw Max-latency nr Agg-bandw Max-latency 1 37KB/s37KB/s36KB/s218K usec 1 8006KB/s 20529 usec 2 185KB/s
[Devel] Re: Performance numbers with IO throttling patches (Was: Re: IO scheduler based IO controller V10)
On Sat, Oct 10, 2009 at 03:53:16PM -0400, Vivek Goyal wrote: On Thu, Sep 24, 2009 at 02:33:15PM -0700, Andrew Morton wrote: [..] Environment == A 7200 RPM SATA drive with queue depth of 31. Ext3 filesystem. That's a bit of a toy. Do we have testing results for more enterprisey hardware? Big storage arrays? SSD? Infiniband? iscsi? nfs? (lol, gotcha) Hi All, Hi Vivek, thanks for posting this detailed report first of all. A few comments below. Couple of days back I posted some performance number of IO scheduler controller and dm-ioband here. http://lkml.org/lkml/2009/10/8/9 Now I have run similar tests with Andrea Righi's IO throttling approach of max bandwidth control. This is the exercise to understand pros/cons of each approach and see how can we take things forward. Environment === Software - 2.6.31 kenrel - IO scheduler controller V10 on top of 2.6.31 - IO throttling patch on top of 2.6.31. Patch is available here. http://www.develer.com/~arighi/linux/patches/io-throttle/old/cgroup-io-throttle-2.6.31.patch Hardware A storage array of 5 striped disks of 500GB each. Used fio jobs for 30 seconds in various configurations. Most of the IO is direct IO to eliminate the effects of caches. I have run three sets for each test. Blindly reporting results of set2 from each test, otherwise it is too much of data to report. Had lun of 2500GB capacity. Used 200G partition with ext3 file system for my testing. For IO scheduler controller testing, created two cgroups of weight 100 each so that effectively disk can be divided half/half between two groups. For IO throttling patches also created two cgroups. Now tricky part is that it is a max bw controller and not a proportional weight controller. So dividing the disk capacity half/half between two cgroups is tricky. The reason being I just don't know what's the BW capacity of underlying storage. Throughput varies so much with type of workload. For example, on my arrary, this is how throughput looks like with different workloads. 8 sequential buffered readers 115 MB/s 8 direct sequential readers bs=64K64 MB/s 8 direct sequential readers bs=4K 14 MB/s 8 buffered random readers bs=64K 3 MB/s 8 direct random readers bs=64K15 MB/s 8 direct random readers bs=4K 1.5 MB/s So throughput seems to be varying from 1.5 MB/s to 115 MB/s depending on workload. What should be the BW limits per cgroup to divide disk BW in half/half between two groups? So I took a conservative estimate and divide max bandwidth divide by 2, and thought of array capacity as 60MB/s and assign each cgroup 30MB/s. In some cases I have assigened even 10MB/s or 5MB/s to each cgropu to see the effects of throttling. I am using Leaky bucket policy for all the tests. As theme of two controllers is different, at some places it might sound like apples vs oranges comparison. But still it does help... Multiple Random Reader vs Sequential Reader === Generally random readers bring the throughput down of others in the system. Ran a test to see the impact of increasing number of random readers on single sequential reader in different groups. Vanilla CFQ --- [Multiple Random Reader] [Sequential Reader] nr Max-bandw Min-bandw Agg-bandw Max-latency nr Agg-bandw Max-latency 1 23KB/s23KB/s22KB/s691 msec1 13519KB/s 468K usec 2 152KB/s 152KB/s 297KB/s 244K usec 1 12380KB/s 31675 usec 4 174KB/s 156KB/s 638KB/s 249K usec 1 10860KB/s 36715 usec 8 49KB/s11KB/s310KB/s 1856 msec 1 1292KB/s 990K usec 16 63KB/s48KB/s877KB/s 762K usec 1 3905KB/s 506K usec 32 35KB/s27KB/s951KB/s 2655 msec 1 1109KB/s 1910K usec IO scheduler controller + CFQ --- [Multiple Random Reader] [Sequential Reader] nr Max-bandw Min-bandw Agg-bandw Max-latency nr Agg-bandw Max-latency 1 228KB/s 228KB/s 223KB/s 132K usec 1 5551KB/s 129K usec 2 97KB/s97KB/s190KB/s 154K usec 1 5718KB/s 122K usec 4 115KB/s 110KB/s 445KB/s 208K usec 1 5909KB/s 116K usec 8 23KB/s12KB/s158KB/s 2820 msec 1 5445KB/s 168K usec 16 11KB/s3KB/s 145KB/s 5963 msec 1 5418KB/s 164K usec 32 6KB/s 2KB/s 139KB/s 12762 msec 1 5398KB/s 175K usec Notes: - Sequential reader in group2 seems to be well isolated from random readers in group1. Throughput and latency of sequential reader are stable and don't drop as number of random readers inrease in system. io-throttle + CFQ -- BW limit group1=10 MB/s