Re: [patch 0/3 v2] raid5: make stripe handling multi-threading
On Mon, 12 Aug 2013 10:18:03 +0800 Shaohua Li wrote: > Neil, > > This is another attempt to make raid5 stripe handling multi-threading. > Recent workqueue improvement for unbound workqueue looks very promising to the > raid5 usage. I had details in the first patch. > > The patches are against your tree with patch 'raid5: make release_stripe > lockless' and 'raid5: fix stripe release order' but without 'raid5: create > multiple threads to handle stripes' > > My test setup has 7 PCIe SSD, chunksize 8k, stripe_cache_size 2048. If > enabling > multi-threading, group_thread_cnt is set to 8. > > randwrite throughput(ratio) unpatch/patch requestsize(sectors) > unpatch/patch > 4k1/5.9 8/8 > 8k1/5.5 16/13 > 16k 1/4.8 16/13 > 32k 1/4.6 18/14 > 64k 1/4.7 17/13 > 128k 1/4.2 23/16 > 256k 1/3.5 41/21 > 512k 1/3 75/28 > 1M1/2.6 134/34 > > For example, in 1M randwrite test, patched kernel is 2.6x faster, but average > request sending to each disk is drop to 34 sectors from 134 sectors long. > > Currently the biggest problem is request size is dropped, because there are > multiple threads dispatching requests. This indicates multi-threading might > not > be proper for all setup, so I disable it by default in this version. But since > throughput is largly improved, I thought this isn't a blocking issue. I'm > still > working on improving this, maybe schedule stripes from one block plug as a > whole. > > Thanks, > Shaohua Thanks. I like this a lot more than the previous version. It doesn't seem to apply exactly to my current 'for-next' - probably because I have moved things around and have a different set of patches applied :-( If you could rebase it on my current for-next I'll apply it and probably submit for next merge window. A couple of little changes I'd like made: 1/ alloc_thread_groups need to use GFP_NOIO, it least when called from raid5_store_group_thread_cnt. At this point in time IO to the RAID5 is stalled so if the malloc needs to free memory it might wait for writeout to the RAID5 and so deadlock. GFP_NOIO prevents that. 2/ could we move the + if (!cpu_online(cpu)) { + cpu = cpumask_any(cpu_online_mask); + sh->cpu = cpu; + } inside raid5_wakeup_stripe_thread() ? It isn't a perfect fit, but I think it is a better place for it. It could check list_empty(>lru) and if it is empty, add to the appropriate group->handle_list. The code in do_release_stripe would become else { clear_bit(STRIPE_DELAYED, >state); clear_bit(STRIPE_BIT_DELAY, >state); + if (conf->worker_cnt_per_group == 0) { + list_add_tail(>lru, >handle_list); + } else { + raid5_wakeup_stripe_thread(sh); + return; + } } md_wakeup_thread(conf->mddev->thread); ?? NeilBrown signature.asc Description: PGP signature
Re: [patch 0/3 v2] raid5: make stripe handling multi-threading
On Mon, 12 Aug 2013 10:18:03 +0800 Shaohua Li s...@kernel.org wrote: Neil, This is another attempt to make raid5 stripe handling multi-threading. Recent workqueue improvement for unbound workqueue looks very promising to the raid5 usage. I had details in the first patch. The patches are against your tree with patch 'raid5: make release_stripe lockless' and 'raid5: fix stripe release order' but without 'raid5: create multiple threads to handle stripes' My test setup has 7 PCIe SSD, chunksize 8k, stripe_cache_size 2048. If enabling multi-threading, group_thread_cnt is set to 8. randwrite throughput(ratio) unpatch/patch requestsize(sectors) unpatch/patch 4k1/5.9 8/8 8k1/5.5 16/13 16k 1/4.8 16/13 32k 1/4.6 18/14 64k 1/4.7 17/13 128k 1/4.2 23/16 256k 1/3.5 41/21 512k 1/3 75/28 1M1/2.6 134/34 For example, in 1M randwrite test, patched kernel is 2.6x faster, but average request sending to each disk is drop to 34 sectors from 134 sectors long. Currently the biggest problem is request size is dropped, because there are multiple threads dispatching requests. This indicates multi-threading might not be proper for all setup, so I disable it by default in this version. But since throughput is largly improved, I thought this isn't a blocking issue. I'm still working on improving this, maybe schedule stripes from one block plug as a whole. Thanks, Shaohua Thanks. I like this a lot more than the previous version. It doesn't seem to apply exactly to my current 'for-next' - probably because I have moved things around and have a different set of patches applied :-( If you could rebase it on my current for-next I'll apply it and probably submit for next merge window. A couple of little changes I'd like made: 1/ alloc_thread_groups need to use GFP_NOIO, it least when called from raid5_store_group_thread_cnt. At this point in time IO to the RAID5 is stalled so if the malloc needs to free memory it might wait for writeout to the RAID5 and so deadlock. GFP_NOIO prevents that. 2/ could we move the + if (!cpu_online(cpu)) { + cpu = cpumask_any(cpu_online_mask); + sh-cpu = cpu; + } inside raid5_wakeup_stripe_thread() ? It isn't a perfect fit, but I think it is a better place for it. It could check list_empty(sh-lru) and if it is empty, add to the appropriate group-handle_list. The code in do_release_stripe would become else { clear_bit(STRIPE_DELAYED, sh-state); clear_bit(STRIPE_BIT_DELAY, sh-state); + if (conf-worker_cnt_per_group == 0) { + list_add_tail(sh-lru, conf-handle_list); + } else { + raid5_wakeup_stripe_thread(sh); + return; + } } md_wakeup_thread(conf-mddev-thread); ?? NeilBrown signature.asc Description: PGP signature
[patch 0/3 v2] raid5: make stripe handling multi-threading
Neil, This is another attempt to make raid5 stripe handling multi-threading. Recent workqueue improvement for unbound workqueue looks very promising to the raid5 usage. I had details in the first patch. The patches are against your tree with patch 'raid5: make release_stripe lockless' and 'raid5: fix stripe release order' but without 'raid5: create multiple threads to handle stripes' My test setup has 7 PCIe SSD, chunksize 8k, stripe_cache_size 2048. If enabling multi-threading, group_thread_cnt is set to 8. randwrite throughput(ratio) unpatch/patch requestsize(sectors) unpatch/patch 4k 1/5.9 8/8 8k 1/5.5 16/13 16k 1/4.8 16/13 32k 1/4.6 18/14 64k 1/4.7 17/13 128k1/4.2 23/16 256k1/3.5 41/21 512k1/3 75/28 1M 1/2.6 134/34 For example, in 1M randwrite test, patched kernel is 2.6x faster, but average request sending to each disk is drop to 34 sectors from 134 sectors long. Currently the biggest problem is request size is dropped, because there are multiple threads dispatching requests. This indicates multi-threading might not be proper for all setup, so I disable it by default in this version. But since throughput is largly improved, I thought this isn't a blocking issue. I'm still working on improving this, maybe schedule stripes from one block plug as a whole. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/3 v2] raid5: make stripe handling multi-threading
Neil, This is another attempt to make raid5 stripe handling multi-threading. Recent workqueue improvement for unbound workqueue looks very promising to the raid5 usage. I had details in the first patch. The patches are against your tree with patch 'raid5: make release_stripe lockless' and 'raid5: fix stripe release order' but without 'raid5: create multiple threads to handle stripes' My test setup has 7 PCIe SSD, chunksize 8k, stripe_cache_size 2048. If enabling multi-threading, group_thread_cnt is set to 8. randwrite throughput(ratio) unpatch/patch requestsize(sectors) unpatch/patch 4k 1/5.9 8/8 8k 1/5.5 16/13 16k 1/4.8 16/13 32k 1/4.6 18/14 64k 1/4.7 17/13 128k1/4.2 23/16 256k1/3.5 41/21 512k1/3 75/28 1M 1/2.6 134/34 For example, in 1M randwrite test, patched kernel is 2.6x faster, but average request sending to each disk is drop to 34 sectors from 134 sectors long. Currently the biggest problem is request size is dropped, because there are multiple threads dispatching requests. This indicates multi-threading might not be proper for all setup, so I disable it by default in this version. But since throughput is largly improved, I thought this isn't a blocking issue. I'm still working on improving this, maybe schedule stripes from one block plug as a whole. Thanks, Shaohua -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/