RE: [PATCH 00/10] mpt3sas: full mq support
> > - Later we can explore if nr_hw_queue more than one really add benefit. > > From current limited testing, I don't see major performance boost if > > we have nr_hw_queue more than one. > > > Well, the _actual_ code to support mq is rather trivial, and really serves > as a > good testbed for scsi-mq. > I would prefer to leave it in, and disable it via a module parameter. I am thinking as adding extra code for more than one nr_hw_queue will add maintenance overhead and support. Especially IO error handling code become complex with nr_hw_queues > 1 case. If we really like to see performance boost, we should attempt and bare other side effect. For time being we should drop this nr_hw_queue > 1 support is what I choose (not even module parameter base). > > But in either case, I can rebase the patches to leave any notions of > 'nr_hw_queues' to patch 8 for implementing full mq support. Thanks Hannes. It was just heads up...We are not sure when we can submit upcoming patch set from Broadcom. May be we can syncup with you offline in case any rebase requires. > > And we need to discuss how to handle MPI2_FUNCTION_SCSI_IO_REQUEST; > the current method doesn't work with blk-mq. > I really would like to see that go, especially as sg/bsg supports the same > functionality ... >
Re: [PATCH 00/10] mpt3sas: full mq support
On 02/16/2017 10:48 AM, Kashyap Desai wrote: >> -Original Message- >> From: Hannes Reinecke [mailto:h...@suse.de] >> Sent: Wednesday, February 15, 2017 3:35 PM >> To: Kashyap Desai; Sreekanth Reddy >> Cc: Christoph Hellwig; Martin K. Petersen; James Bottomley; linux- >> s...@vger.kernel.org; Sathya Prakash Veerichetty; PDL-MPT-FUSIONLINUX >> Subject: Re: [PATCH 00/10] mpt3sas: full mq support >> >> On 02/15/2017 10:18 AM, Kashyap Desai wrote: >>>> >>>> >>>> Hannes, >>>> >>>> Result I have posted last time is with merge operation enabled in >>>> block layer. If I disable merge operation then I don't see much >>>> improvement with multiple hw request queues. Here is the result, >>>> >>>> fio results when nr_hw_queues=1, >>>> 4k read when numjobs=24: io=248387MB, bw=1655.1MB/s, iops=423905, >>>> runt=150003msec >>>> >>>> fio results when nr_hw_queues=24, >>>> 4k read when numjobs=24: io=263904MB, bw=1759.4MB/s, iops=450393, >>>> runt=150001msec >>> >>> Hannes - >>> >>> I worked with Sreekanth and also understand pros/cons of Patch #10. >>> " [PATCH 10/10] mpt3sas: scsi-mq interrupt steering" >>> >>> In above patch, can_queue of HBA is divided based on logic CPU, it >>> means we want to mimic as if mpt3sas HBA support multi queue >>> distributing actual resources which is single Submission H/W Queue. >>> This approach badly impact many performance areas. >>> >>> nr_hw_queues = 1 is what I observe as best performance approach since >>> it never throttle IO if sdev->queue_depth is set to HBA queue depth. >>> In case of nr_hw_queues = "CPUs" throttle IO at SCSI level since we >>> never allow more than "updated can_queue" in LLD. >>> >> True. >> And this was actually one of the things I wanted to demonstrate with this >> patchset :-) ATM blk-mq really works best when having a distinct tag space >> per port/device. As soon as the hardware provides a _shared_ tag space you >> end up with tag starvation issues as blk-mq only allows you to do a static >> split of the available tagspace. >> While this patchset demonstrates that the HBA itself _does_ benefit from >> using block-mq (especially on highly parallel loads), it also demonstrates >> that >> _block-mq_ has issues with singlethreaded loads on this HBA (or, rather, >> type of HBA, as I doubt this issue is affecting mpt3sas only). >> >>> Below code bring actual HBA can_queue very low ( Ea on 96 logical core >>> CPU new can_queue goes to 42, if HBA queue depth is 4K). It means we >>> will see lots of IO throttling in scsi mid layer due to >>> shost->can_queue reach the limit very soon if you have jobs with >> higher QD. >>> >>> if (ioc->shost->nr_hw_queues > 1) { >>> ioc->shost->nr_hw_queues = ioc->msix_vector_count; >>> ioc->shost->can_queue /= ioc->msix_vector_count; >>> } >>> I observe negative performance if I have 8 SSD drives attached to >>> Ventura (latest IT controller). 16 fio jobs at QD=128 gives ~1600K >>> IOPs and the moment I switch to nr_hw_queues = "CPUs", it gave hardly >>> ~850K IOPs. This is mainly because of host_busy stuck at very low ~169 >>> on >> my setup. >>> >> Which actually might be an issue with the way scsi is hooked into blk-mq. >> The SCSI stack is using 'can_queue' as a check for 'host_busy', ie if the >> host is >> capable of accepting more commands. >> As we're limiting can_queue (to get the per-queue command depth >> correctly) we should be using the _overall_ command depth for the >> can_queue value itself to make the host_busy check work correctly. >> >> I've attached a patch for that; can you test if it makes a difference? > Hannes - > Attached patch works fine for me. FYI - We need to set device queue depth > to can_queue as we are currently not doing in mpt3sas driver. > > With attached patch when I tried, I see ~2-3% improvement running multiple > jobs. Single job profile no difference. > > So looks like we are good to reach performance with single nr_hw_queues. > Whee, cool. > We have some patches to be send so want to know how to rebase this patch > series as few patches coming from Broadcom. Can we consider below as plan ? > Sure, can do. > - Patches from 1-7 will be reposted. Also Sreekanth will complete review on > exi
RE: [PATCH 00/10] mpt3sas: full mq support
> -Original Message- > From: Hannes Reinecke [mailto:h...@suse.de] > Sent: Wednesday, February 15, 2017 3:35 PM > To: Kashyap Desai; Sreekanth Reddy > Cc: Christoph Hellwig; Martin K. Petersen; James Bottomley; linux- > s...@vger.kernel.org; Sathya Prakash Veerichetty; PDL-MPT-FUSIONLINUX > Subject: Re: [PATCH 00/10] mpt3sas: full mq support > > On 02/15/2017 10:18 AM, Kashyap Desai wrote: > >> > >> > >> Hannes, > >> > >> Result I have posted last time is with merge operation enabled in > >> block layer. If I disable merge operation then I don't see much > >> improvement with multiple hw request queues. Here is the result, > >> > >> fio results when nr_hw_queues=1, > >> 4k read when numjobs=24: io=248387MB, bw=1655.1MB/s, iops=423905, > >> runt=150003msec > >> > >> fio results when nr_hw_queues=24, > >> 4k read when numjobs=24: io=263904MB, bw=1759.4MB/s, iops=450393, > >> runt=150001msec > > > > Hannes - > > > > I worked with Sreekanth and also understand pros/cons of Patch #10. > > " [PATCH 10/10] mpt3sas: scsi-mq interrupt steering" > > > > In above patch, can_queue of HBA is divided based on logic CPU, it > > means we want to mimic as if mpt3sas HBA support multi queue > > distributing actual resources which is single Submission H/W Queue. > > This approach badly impact many performance areas. > > > > nr_hw_queues = 1 is what I observe as best performance approach since > > it never throttle IO if sdev->queue_depth is set to HBA queue depth. > > In case of nr_hw_queues = "CPUs" throttle IO at SCSI level since we > > never allow more than "updated can_queue" in LLD. > > > True. > And this was actually one of the things I wanted to demonstrate with this > patchset :-) ATM blk-mq really works best when having a distinct tag space > per port/device. As soon as the hardware provides a _shared_ tag space you > end up with tag starvation issues as blk-mq only allows you to do a static > split of the available tagspace. > While this patchset demonstrates that the HBA itself _does_ benefit from > using block-mq (especially on highly parallel loads), it also demonstrates > that > _block-mq_ has issues with singlethreaded loads on this HBA (or, rather, > type of HBA, as I doubt this issue is affecting mpt3sas only). > > > Below code bring actual HBA can_queue very low ( Ea on 96 logical core > > CPU new can_queue goes to 42, if HBA queue depth is 4K). It means we > > will see lots of IO throttling in scsi mid layer due to > > shost->can_queue reach the limit very soon if you have jobs with > higher QD. > > > > if (ioc->shost->nr_hw_queues > 1) { > > ioc->shost->nr_hw_queues = ioc->msix_vector_count; > > ioc->shost->can_queue /= ioc->msix_vector_count; > > } > > I observe negative performance if I have 8 SSD drives attached to > > Ventura (latest IT controller). 16 fio jobs at QD=128 gives ~1600K > > IOPs and the moment I switch to nr_hw_queues = "CPUs", it gave hardly > > ~850K IOPs. This is mainly because of host_busy stuck at very low ~169 > > on > my setup. > > > Which actually might be an issue with the way scsi is hooked into blk-mq. > The SCSI stack is using 'can_queue' as a check for 'host_busy', ie if the > host is > capable of accepting more commands. > As we're limiting can_queue (to get the per-queue command depth > correctly) we should be using the _overall_ command depth for the > can_queue value itself to make the host_busy check work correctly. > > I've attached a patch for that; can you test if it makes a difference? Hannes - Attached patch works fine for me. FYI - We need to set device queue depth to can_queue as we are currently not doing in mpt3sas driver. With attached patch when I tried, I see ~2-3% improvement running multiple jobs. Single job profile no difference. So looks like we are good to reach performance with single nr_hw_queues. We have some patches to be send so want to know how to rebase this patch series as few patches coming from Broadcom. Can we consider below as plan ? - Patches from 1-7 will be reposted. Also Sreekanth will complete review on existing patch 1-7. - We need blk_tag support only for nr_hw_queue = 1. With that say, we will have many code changes/function without " shost_use_blk_mq" check and assume it is single nr_hw_queue supported driver. Ea - Below function can be simplify - just refer tag from scmd->request and don't need check of shost_use_blk_mq + nr_hw_queue etc.. u16 mpt3sas_base_get_smid_scsiio(struct MPT3SAS
RE: [PATCH 00/10] mpt3sas: full mq support
> > > Hannes, > > Result I have posted last time is with merge operation enabled in block > layer. If I disable merge operation then I don't see much improvement > with > multiple hw request queues. Here is the result, > > fio results when nr_hw_queues=1, > 4k read when numjobs=24: io=248387MB, bw=1655.1MB/s, iops=423905, > runt=150003msec > > fio results when nr_hw_queues=24, > 4k read when numjobs=24: io=263904MB, bw=1759.4MB/s, iops=450393, > runt=150001msec Hannes - I worked with Sreekanth and also understand pros/cons of Patch #10. " [PATCH 10/10] mpt3sas: scsi-mq interrupt steering" In above patch, can_queue of HBA is divided based on logic CPU, it means we want to mimic as if mpt3sas HBA support multi queue distributing actual resources which is single Submission H/W Queue. This approach badly impact many performance areas. nr_hw_queues = 1 is what I observe as best performance approach since it never throttle IO if sdev->queue_depth is set to HBA queue depth. In case of nr_hw_queues = "CPUs" throttle IO at SCSI level since we never allow more than "updated can_queue" in LLD. Below code bring actual HBA can_queue very low ( Ea on 96 logical core CPU new can_queue goes to 42, if HBA queue depth is 4K). It means we will see lots of IO throttling in scsi mid layer due to shost->can_queue reach the limit very soon if you have jobs with higher QD. if (ioc->shost->nr_hw_queues > 1) { ioc->shost->nr_hw_queues = ioc->msix_vector_count; ioc->shost->can_queue /= ioc->msix_vector_count; } I observe negative performance if I have 8 SSD drives attached to Ventura (latest IT controller). 16 fio jobs at QD=128 gives ~1600K IOPs and the moment I switch to nr_hw_queues = "CPUs", it gave hardly ~850K IOPs. This is mainly because of host_busy stuck at very low ~169 on my setup. May be as Sreekanth mentioned, performance improvement you have observed is due to nomerges=2 is not set and OS will attempt soft back/front merge. I debug live machine and understood we never see parallel instance of "scsi_dispatch_cmd" as we expect due to can_queue is less. If we really has *very* large HBA QD, this patch #10 to expose multiple SQ may be useful. For now, we are looking for updated version of patch which will only keep IT HBA in SQ mode (like we are doing in driver) and add interface to use blk_tag in both scsi.mq and !scsi.mq mode. Sreekanth has already started working on it, but we may need to check full performance test run to post the actual patch. May be we can cherry pick few patches from this series and get blk_tag support to improve performance of later which will not allow use to choose nr_hw_queue to be tunable. Thanks, Kashyap > > Thanks, > Sreekanth
Re: [PATCH 00/10] mpt3sas: full mq support
On Mon, Feb 13, 2017 at 6:41 PM, Hannes Reineckewrote: > On 02/13/2017 07:15 AM, Sreekanth Reddy wrote: >> On Fri, Feb 10, 2017 at 12:29 PM, Hannes Reinecke wrote: >>> On 02/10/2017 05:43 AM, Sreekanth Reddy wrote: On Thu, Feb 9, 2017 at 6:42 PM, Hannes Reinecke wrote: > On 02/09/2017 02:03 PM, Sreekanth Reddy wrote: >>> [ .. ] >> >> >> Hannes, >> >> I have created a md raid0 with 4 SAS SSD drives using below command, >> #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh >> /dev/sdi /dev/sdj >> >> And here is 'mdadm --detail /dev/md0' command output, >> -- >> /dev/md0: >> Version : 1.2 >> Creation Time : Thu Feb 9 14:38:47 2017 >> Raid Level : raid0 >> Array Size : 780918784 (744.74 GiB 799.66 GB) >>Raid Devices : 4 >> Total Devices : 4 >> Persistence : Superblock is persistent >> >> Update Time : Thu Feb 9 14:38:47 2017 >> State : clean >> Active Devices : 4 >> Working Devices : 4 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Chunk Size : 512K >> >>Name : host_name >>UUID : b63f9da7:b7de9a25:6a46ca00:42214e22 >> Events : 0 >> >> Number Major Minor RaidDevice State >>0 8 960 active sync /dev/sdg >>1 8 1121 active sync /dev/sdh >>2 8 1442 active sync /dev/sdj >>3 8 1283 active sync /dev/sdi >> -- >> >> Then I have used below fio profile to run 4K sequence read operations >> with nr_hw_queues=1 driver and with nr_hw_queues=24 driver (as my >> system has two numa node and each with 12 cpus). >> - >> global] >> ioengine=libaio >> group_reporting >> direct=1 >> rw=read >> bs=4k >> allow_mounted_write=0 >> iodepth=128 >> runtime=150s >> >> [job1] >> filename=/dev/md0 >> - >> >> Here are the fio results when nr_hw_queues=1 (i.e. single request >> queue) with various number of job counts >> 1JOB 4k read : io=213268MB, bw=1421.8MB/s, iops=363975, runt=150001msec >> 2JOBs 4k read : io=309605MB, bw=2064.2MB/s, iops=528389, runt=150001msec >> 4JOBs 4k read : io=281001MB, bw=1873.4MB/s, iops=479569, runt=150002msec >> 8JOBs 4k read : io=236297MB, bw=1575.2MB/s, iops=403236, runt=150016msec >> >> Here are the fio results when nr_hw_queues=24 (i.e. multiple request >> queue) with various number of job counts >> 1JOB 4k read : io=95194MB, bw=649852KB/s, iops=162463, runt=150001msec >> 2JOBs 4k read : io=189343MB, bw=1262.3MB/s, iops=323142, runt=150001msec >> 4JOBs 4k read : io=314832MB, bw=2098.9MB/s, iops=537309, runt=150001msec >> 8JOBs 4k read : io=277015MB, bw=1846.8MB/s, iops=472769, runt=150001msec >> >> Here we can see that on less number of jobs count, single request >> queue (nr_hw_queues=1) is giving more IOPs than multi request >> queues(nr_hw_queues=24). >> >> Can you please share your fio profile, so that I can try same thing on >> my system. >> > Have you tried with the latest git update from Jens for-4.11/block (or > for-4.11/next) branch? I am using below git repo, https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.11/scsi-queue Today I will try with Jens for-4.11/block. >>> By all means, do. >>> > I've found that using the mq-deadline scheduler has a noticeable > performance boost. > > The fio job I'm using is essentially the same; you just should make sure > to specify a 'numjob=' statement in there. > Otherwise fio will just use a single CPU, which of course leads to > averse effects in the multiqueue case. Yes I am providing 'numjob=' on fio command line as shown below, # fio md_fio_profile --numjobs=8 --output=fio_results.txt >>> Still, it looks as if you'd be using less jobs than you have CPUs. >>> Which means you'll be running into a tag starvation scenario on those >>> CPUs, especially for the small blocksizes. >>> What are the results if you set 'numjobs' to the number of CPUs? >>> >> >> Hannes, >> >> Tried on Jens for-4.11/block kernel repo and also set each block PD's >> scheduler as 'mq-deadline', and here is my results for 4K SR on md0 >> (raid0 with 4 drives). I have 24 CPUs and so tried even with setting >> numjobs=24. >>
Re: [PATCH 00/10] mpt3sas: full mq support
On 02/15/2017 09:15 AM, Christoph Hellwig wrote: > On Tue, Feb 07, 2017 at 02:19:09PM +0100, Christoph Hellwig wrote: >> Patch 1-7 look fine to me with minor fixups, and I'd love to see >> them go into 4.11. > > Any chance to see a resend of these? > Sure. Will do shortly. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH 00/10] mpt3sas: full mq support
On Tue, Feb 07, 2017 at 02:19:09PM +0100, Christoph Hellwig wrote: > Patch 1-7 look fine to me with minor fixups, and I'd love to see > them go into 4.11. Any chance to see a resend of these?
Re: [PATCH 00/10] mpt3sas: full mq support
On 02/13/2017 07:15 AM, Sreekanth Reddy wrote: > On Fri, Feb 10, 2017 at 12:29 PM, Hannes Reineckewrote: >> On 02/10/2017 05:43 AM, Sreekanth Reddy wrote: >>> On Thu, Feb 9, 2017 at 6:42 PM, Hannes Reinecke wrote: On 02/09/2017 02:03 PM, Sreekanth Reddy wrote: >> [ .. ] > > > Hannes, > > I have created a md raid0 with 4 SAS SSD drives using below command, > #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh > /dev/sdi /dev/sdj > > And here is 'mdadm --detail /dev/md0' command output, > -- > /dev/md0: > Version : 1.2 > Creation Time : Thu Feb 9 14:38:47 2017 > Raid Level : raid0 > Array Size : 780918784 (744.74 GiB 799.66 GB) >Raid Devices : 4 > Total Devices : 4 > Persistence : Superblock is persistent > > Update Time : Thu Feb 9 14:38:47 2017 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Chunk Size : 512K > >Name : host_name >UUID : b63f9da7:b7de9a25:6a46ca00:42214e22 > Events : 0 > > Number Major Minor RaidDevice State >0 8 960 active sync /dev/sdg >1 8 1121 active sync /dev/sdh >2 8 1442 active sync /dev/sdj >3 8 1283 active sync /dev/sdi > -- > > Then I have used below fio profile to run 4K sequence read operations > with nr_hw_queues=1 driver and with nr_hw_queues=24 driver (as my > system has two numa node and each with 12 cpus). > - > global] > ioengine=libaio > group_reporting > direct=1 > rw=read > bs=4k > allow_mounted_write=0 > iodepth=128 > runtime=150s > > [job1] > filename=/dev/md0 > - > > Here are the fio results when nr_hw_queues=1 (i.e. single request > queue) with various number of job counts > 1JOB 4k read : io=213268MB, bw=1421.8MB/s, iops=363975, runt=150001msec > 2JOBs 4k read : io=309605MB, bw=2064.2MB/s, iops=528389, runt=150001msec > 4JOBs 4k read : io=281001MB, bw=1873.4MB/s, iops=479569, runt=150002msec > 8JOBs 4k read : io=236297MB, bw=1575.2MB/s, iops=403236, runt=150016msec > > Here are the fio results when nr_hw_queues=24 (i.e. multiple request > queue) with various number of job counts > 1JOB 4k read : io=95194MB, bw=649852KB/s, iops=162463, runt=150001msec > 2JOBs 4k read : io=189343MB, bw=1262.3MB/s, iops=323142, runt=150001msec > 4JOBs 4k read : io=314832MB, bw=2098.9MB/s, iops=537309, runt=150001msec > 8JOBs 4k read : io=277015MB, bw=1846.8MB/s, iops=472769, runt=150001msec > > Here we can see that on less number of jobs count, single request > queue (nr_hw_queues=1) is giving more IOPs than multi request > queues(nr_hw_queues=24). > > Can you please share your fio profile, so that I can try same thing on > my system. > Have you tried with the latest git update from Jens for-4.11/block (or for-4.11/next) branch? >>> >>> I am using below git repo, >>> >>> https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.11/scsi-queue >>> >>> Today I will try with Jens for-4.11/block. >>> >> By all means, do. >> I've found that using the mq-deadline scheduler has a noticeable performance boost. The fio job I'm using is essentially the same; you just should make sure to specify a 'numjob=' statement in there. Otherwise fio will just use a single CPU, which of course leads to averse effects in the multiqueue case. >>> >>> Yes I am providing 'numjob=' on fio command line as shown below, >>> >>> # fio md_fio_profile --numjobs=8 --output=fio_results.txt >>> >> Still, it looks as if you'd be using less jobs than you have CPUs. >> Which means you'll be running into a tag starvation scenario on those >> CPUs, especially for the small blocksizes. >> What are the results if you set 'numjobs' to the number of CPUs? >> > > Hannes, > > Tried on Jens for-4.11/block kernel repo and also set each block PD's > scheduler as 'mq-deadline', and here is my results for 4K SR on md0 > (raid0 with 4 drives). I have 24 CPUs and so tried even with setting > numjobs=24. > > fio results when nr_hw_queues=1 (i.e. single request queue) with > various number of job counts > > 4k read when numjobs=1 : io=215553MB, bw=1437.9MB/s, iops=367874, > runt=150001msec >
Re: [PATCH 00/10] mpt3sas: full mq support
On Fri, Feb 10, 2017 at 12:29 PM, Hannes Reineckewrote: > On 02/10/2017 05:43 AM, Sreekanth Reddy wrote: >> On Thu, Feb 9, 2017 at 6:42 PM, Hannes Reinecke wrote: >>> On 02/09/2017 02:03 PM, Sreekanth Reddy wrote: > [ .. ] Hannes, I have created a md raid0 with 4 SAS SSD drives using below command, #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh /dev/sdi /dev/sdj And here is 'mdadm --detail /dev/md0' command output, -- /dev/md0: Version : 1.2 Creation Time : Thu Feb 9 14:38:47 2017 Raid Level : raid0 Array Size : 780918784 (744.74 GiB 799.66 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Thu Feb 9 14:38:47 2017 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Chunk Size : 512K Name : host_name UUID : b63f9da7:b7de9a25:6a46ca00:42214e22 Events : 0 Number Major Minor RaidDevice State 0 8 960 active sync /dev/sdg 1 8 1121 active sync /dev/sdh 2 8 1442 active sync /dev/sdj 3 8 1283 active sync /dev/sdi -- Then I have used below fio profile to run 4K sequence read operations with nr_hw_queues=1 driver and with nr_hw_queues=24 driver (as my system has two numa node and each with 12 cpus). - global] ioengine=libaio group_reporting direct=1 rw=read bs=4k allow_mounted_write=0 iodepth=128 runtime=150s [job1] filename=/dev/md0 - Here are the fio results when nr_hw_queues=1 (i.e. single request queue) with various number of job counts 1JOB 4k read : io=213268MB, bw=1421.8MB/s, iops=363975, runt=150001msec 2JOBs 4k read : io=309605MB, bw=2064.2MB/s, iops=528389, runt=150001msec 4JOBs 4k read : io=281001MB, bw=1873.4MB/s, iops=479569, runt=150002msec 8JOBs 4k read : io=236297MB, bw=1575.2MB/s, iops=403236, runt=150016msec Here are the fio results when nr_hw_queues=24 (i.e. multiple request queue) with various number of job counts 1JOB 4k read : io=95194MB, bw=649852KB/s, iops=162463, runt=150001msec 2JOBs 4k read : io=189343MB, bw=1262.3MB/s, iops=323142, runt=150001msec 4JOBs 4k read : io=314832MB, bw=2098.9MB/s, iops=537309, runt=150001msec 8JOBs 4k read : io=277015MB, bw=1846.8MB/s, iops=472769, runt=150001msec Here we can see that on less number of jobs count, single request queue (nr_hw_queues=1) is giving more IOPs than multi request queues(nr_hw_queues=24). Can you please share your fio profile, so that I can try same thing on my system. >>> Have you tried with the latest git update from Jens for-4.11/block (or >>> for-4.11/next) branch? >> >> I am using below git repo, >> >> https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.11/scsi-queue >> >> Today I will try with Jens for-4.11/block. >> > By all means, do. > >>> I've found that using the mq-deadline scheduler has a noticeable >>> performance boost. >>> >>> The fio job I'm using is essentially the same; you just should make sure >>> to specify a 'numjob=' statement in there. >>> Otherwise fio will just use a single CPU, which of course leads to >>> averse effects in the multiqueue case. >> >> Yes I am providing 'numjob=' on fio command line as shown below, >> >> # fio md_fio_profile --numjobs=8 --output=fio_results.txt >> > Still, it looks as if you'd be using less jobs than you have CPUs. > Which means you'll be running into a tag starvation scenario on those > CPUs, especially for the small blocksizes. > What are the results if you set 'numjobs' to the number of CPUs? > Hannes, Tried on Jens for-4.11/block kernel repo and also set each block PD's scheduler as 'mq-deadline', and here is my results for 4K SR on md0 (raid0 with 4 drives). I have 24 CPUs and so tried even with setting numjobs=24. fio results when nr_hw_queues=1 (i.e. single request queue) with various number of job counts 4k read when numjobs=1 : io=215553MB, bw=1437.9MB/s, iops=367874, runt=150001msec 4k read when numjobs=2 : io=307771MB, bw=2051.9MB/s, iops=525258, runt=150001msec 4k read when numjobs=4 : io=300382MB, bw=2002.6MB/s, iops=512644, runt=150002msec 4k read when numjobs=8 :
Re: [PATCH 00/10] mpt3sas: full mq support
On 02/10/2017 05:43 AM, Sreekanth Reddy wrote: > On Thu, Feb 9, 2017 at 6:42 PM, Hannes Reineckewrote: >> On 02/09/2017 02:03 PM, Sreekanth Reddy wrote: [ .. ] >>> >>> >>> Hannes, >>> >>> I have created a md raid0 with 4 SAS SSD drives using below command, >>> #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh >>> /dev/sdi /dev/sdj >>> >>> And here is 'mdadm --detail /dev/md0' command output, >>> -- >>> /dev/md0: >>> Version : 1.2 >>> Creation Time : Thu Feb 9 14:38:47 2017 >>> Raid Level : raid0 >>> Array Size : 780918784 (744.74 GiB 799.66 GB) >>>Raid Devices : 4 >>> Total Devices : 4 >>> Persistence : Superblock is persistent >>> >>> Update Time : Thu Feb 9 14:38:47 2017 >>> State : clean >>> Active Devices : 4 >>> Working Devices : 4 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Chunk Size : 512K >>> >>>Name : host_name >>>UUID : b63f9da7:b7de9a25:6a46ca00:42214e22 >>> Events : 0 >>> >>> Number Major Minor RaidDevice State >>>0 8 960 active sync /dev/sdg >>>1 8 1121 active sync /dev/sdh >>>2 8 1442 active sync /dev/sdj >>>3 8 1283 active sync /dev/sdi >>> -- >>> >>> Then I have used below fio profile to run 4K sequence read operations >>> with nr_hw_queues=1 driver and with nr_hw_queues=24 driver (as my >>> system has two numa node and each with 12 cpus). >>> - >>> global] >>> ioengine=libaio >>> group_reporting >>> direct=1 >>> rw=read >>> bs=4k >>> allow_mounted_write=0 >>> iodepth=128 >>> runtime=150s >>> >>> [job1] >>> filename=/dev/md0 >>> - >>> >>> Here are the fio results when nr_hw_queues=1 (i.e. single request >>> queue) with various number of job counts >>> 1JOB 4k read : io=213268MB, bw=1421.8MB/s, iops=363975, runt=150001msec >>> 2JOBs 4k read : io=309605MB, bw=2064.2MB/s, iops=528389, runt=150001msec >>> 4JOBs 4k read : io=281001MB, bw=1873.4MB/s, iops=479569, runt=150002msec >>> 8JOBs 4k read : io=236297MB, bw=1575.2MB/s, iops=403236, runt=150016msec >>> >>> Here are the fio results when nr_hw_queues=24 (i.e. multiple request >>> queue) with various number of job counts >>> 1JOB 4k read : io=95194MB, bw=649852KB/s, iops=162463, runt=150001msec >>> 2JOBs 4k read : io=189343MB, bw=1262.3MB/s, iops=323142, runt=150001msec >>> 4JOBs 4k read : io=314832MB, bw=2098.9MB/s, iops=537309, runt=150001msec >>> 8JOBs 4k read : io=277015MB, bw=1846.8MB/s, iops=472769, runt=150001msec >>> >>> Here we can see that on less number of jobs count, single request >>> queue (nr_hw_queues=1) is giving more IOPs than multi request >>> queues(nr_hw_queues=24). >>> >>> Can you please share your fio profile, so that I can try same thing on >>> my system. >>> >> Have you tried with the latest git update from Jens for-4.11/block (or >> for-4.11/next) branch? > > I am using below git repo, > > https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.11/scsi-queue > > Today I will try with Jens for-4.11/block. > By all means, do. >> I've found that using the mq-deadline scheduler has a noticeable >> performance boost. >> >> The fio job I'm using is essentially the same; you just should make sure >> to specify a 'numjob=' statement in there. >> Otherwise fio will just use a single CPU, which of course leads to >> averse effects in the multiqueue case. > > Yes I am providing 'numjob=' on fio command line as shown below, > > # fio md_fio_profile --numjobs=8 --output=fio_results.txt > Still, it looks as if you'd be using less jobs than you have CPUs. Which means you'll be running into a tag starvation scenario on those CPUs, especially for the small blocksizes. What are the results if you set 'numjobs' to the number of CPUs? Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH 00/10] mpt3sas: full mq support
On Thu, Feb 9, 2017 at 6:42 PM, Hannes Reinecke <h...@suse.de> wrote: > On 02/09/2017 02:03 PM, Sreekanth Reddy wrote: >> On Wed, Feb 1, 2017 at 1:13 PM, Hannes Reinecke <h...@suse.de> wrote: >>> >>> On 02/01/2017 08:07 AM, Kashyap Desai wrote: >>>>> >>>>> -Original Message- >>>>> From: Hannes Reinecke [mailto:h...@suse.de] >>>>> Sent: Wednesday, February 01, 2017 12:21 PM >>>>> To: Kashyap Desai; Christoph Hellwig >>>>> Cc: Martin K. Petersen; James Bottomley; linux-scsi@vger.kernel.org; >>>>> Sathya >>>>> Prakash Veerichetty; PDL-MPT-FUSIONLINUX; Sreekanth Reddy >>>>> Subject: Re: [PATCH 00/10] mpt3sas: full mq support >>>>> >>>>> On 01/31/2017 06:54 PM, Kashyap Desai wrote: >>>>>>> >>>>>>> -Original Message- >>>>>>> From: Hannes Reinecke [mailto:h...@suse.de] >>>>>>> Sent: Tuesday, January 31, 2017 4:47 PM >>>>>>> To: Christoph Hellwig >>>>>>> Cc: Martin K. Petersen; James Bottomley; linux-scsi@vger.kernel.org; >>>>>> >>>>>> Sathya >>>>>>> >>>>>>> Prakash; Kashyap Desai; mpt-fusionlinux@broadcom.com >>>>>>> Subject: Re: [PATCH 00/10] mpt3sas: full mq support >>>>>>> >>>>>>> On 01/31/2017 11:02 AM, Christoph Hellwig wrote: >>>>>>>> >>>>>>>> On Tue, Jan 31, 2017 at 10:25:50AM +0100, Hannes Reinecke wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> this is a patchset to enable full multiqueue support for the >>>>>>>>> mpt3sas >>>>>>> >>>>>>> driver. >>>>>>>>> >>>>>>>>> While the HBA only has a single mailbox register for submitting >>>>>>>>> commands, it does have individual receive queues per MSI-X >>>>>>>>> interrupt and as such does benefit from converting it to full >>>>>>>>> multiqueue >>>>>> >>>>>> support. >>>>>>>> >>>>>>>> >>>>>>>> Explanation and numbers on why this would be beneficial, please. >>>>>>>> We should not need multiple submissions queues for a single register >>>>>>>> to benefit from multiple completion queues. >>>>>>>> >>>>>>> Well, the actual throughput very strongly depends on the blk-mq-sched >>>>>>> patches from Jens. >>>>>>> As this is barely finished I didn't post any numbers yet. >>>>>>> >>>>>>> However: >>>>>>> With multiqueue support: >>>>>>> 4k seq read : io=60573MB, bw=1009.2MB/s, iops=258353, runt= >>>>> >>>>> 60021msec >>>>>>> >>>>>>> With scsi-mq on 1 queue: >>>>>>> 4k seq read : io=17369MB, bw=296291KB/s, iops=74072, runt= 60028msec >>>>>>> So yes, there _is_ a benefit. >> >> >> Hannes, >> >> I have created a md raid0 with 4 SAS SSD drives using below command, >> #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh >> /dev/sdi /dev/sdj >> >> And here is 'mdadm --detail /dev/md0' command output, >> -- >> /dev/md0: >> Version : 1.2 >> Creation Time : Thu Feb 9 14:38:47 2017 >> Raid Level : raid0 >> Array Size : 780918784 (744.74 GiB 799.66 GB) >>Raid Devices : 4 >> Total Devices : 4 >> Persistence : Superblock is persistent >> >> Update Time : Thu Feb 9 14:38:47 2017 >> State : clean >> Active Devices : 4 >> Working Devices : 4 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Chunk Size : 512K >> >>Name : host_name >>UUID : b63f9da7:b7de9a25:6a46ca00:42214e22 >> Events : 0 >> >> Number Major Minor RaidDevice State >>0 8 960 active sync /dev/sdg >>1 8 1121 active sync /dev/sdh >>2
Re: [PATCH 00/10] mpt3sas: full mq support
On 02/09/2017 02:03 PM, Sreekanth Reddy wrote: > On Wed, Feb 1, 2017 at 1:13 PM, Hannes Reinecke <h...@suse.de> wrote: >> >> On 02/01/2017 08:07 AM, Kashyap Desai wrote: >>>> >>>> -Original Message- >>>> From: Hannes Reinecke [mailto:h...@suse.de] >>>> Sent: Wednesday, February 01, 2017 12:21 PM >>>> To: Kashyap Desai; Christoph Hellwig >>>> Cc: Martin K. Petersen; James Bottomley; linux-scsi@vger.kernel.org; >>>> Sathya >>>> Prakash Veerichetty; PDL-MPT-FUSIONLINUX; Sreekanth Reddy >>>> Subject: Re: [PATCH 00/10] mpt3sas: full mq support >>>> >>>> On 01/31/2017 06:54 PM, Kashyap Desai wrote: >>>>>> >>>>>> -Original Message- >>>>>> From: Hannes Reinecke [mailto:h...@suse.de] >>>>>> Sent: Tuesday, January 31, 2017 4:47 PM >>>>>> To: Christoph Hellwig >>>>>> Cc: Martin K. Petersen; James Bottomley; linux-scsi@vger.kernel.org; >>>>> >>>>> Sathya >>>>>> >>>>>> Prakash; Kashyap Desai; mpt-fusionlinux@broadcom.com >>>>>> Subject: Re: [PATCH 00/10] mpt3sas: full mq support >>>>>> >>>>>> On 01/31/2017 11:02 AM, Christoph Hellwig wrote: >>>>>>> >>>>>>> On Tue, Jan 31, 2017 at 10:25:50AM +0100, Hannes Reinecke wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> this is a patchset to enable full multiqueue support for the >>>>>>>> mpt3sas >>>>>> >>>>>> driver. >>>>>>>> >>>>>>>> While the HBA only has a single mailbox register for submitting >>>>>>>> commands, it does have individual receive queues per MSI-X >>>>>>>> interrupt and as such does benefit from converting it to full >>>>>>>> multiqueue >>>>> >>>>> support. >>>>>>> >>>>>>> >>>>>>> Explanation and numbers on why this would be beneficial, please. >>>>>>> We should not need multiple submissions queues for a single register >>>>>>> to benefit from multiple completion queues. >>>>>>> >>>>>> Well, the actual throughput very strongly depends on the blk-mq-sched >>>>>> patches from Jens. >>>>>> As this is barely finished I didn't post any numbers yet. >>>>>> >>>>>> However: >>>>>> With multiqueue support: >>>>>> 4k seq read : io=60573MB, bw=1009.2MB/s, iops=258353, runt= >>>> >>>> 60021msec >>>>>> >>>>>> With scsi-mq on 1 queue: >>>>>> 4k seq read : io=17369MB, bw=296291KB/s, iops=74072, runt= 60028msec >>>>>> So yes, there _is_ a benefit. > > > Hannes, > > I have created a md raid0 with 4 SAS SSD drives using below command, > #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh > /dev/sdi /dev/sdj > > And here is 'mdadm --detail /dev/md0' command output, > -- > /dev/md0: > Version : 1.2 > Creation Time : Thu Feb 9 14:38:47 2017 > Raid Level : raid0 > Array Size : 780918784 (744.74 GiB 799.66 GB) >Raid Devices : 4 > Total Devices : 4 > Persistence : Superblock is persistent > > Update Time : Thu Feb 9 14:38:47 2017 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Chunk Size : 512K > >Name : host_name >UUID : b63f9da7:b7de9a25:6a46ca00:42214e22 > Events : 0 > > Number Major Minor RaidDevice State >0 8 960 active sync /dev/sdg >1 8 1121 active sync /dev/sdh >2 8 1442 active sync /dev/sdj >3 8 1283 active sync /dev/sdi > -- > > Then I have used below fio profile to run 4K sequence read operations > with nr_hw_queues=1 driver and with nr_hw_queues=24 driver (as my > system has two numa node and each with 12 cpus). >
Re: [PATCH 00/10] mpt3sas: full mq support
On Wed, Feb 1, 2017 at 1:13 PM, Hannes Reinecke <h...@suse.de> wrote: > > On 02/01/2017 08:07 AM, Kashyap Desai wrote: >>> >>> -Original Message- >>> From: Hannes Reinecke [mailto:h...@suse.de] >>> Sent: Wednesday, February 01, 2017 12:21 PM >>> To: Kashyap Desai; Christoph Hellwig >>> Cc: Martin K. Petersen; James Bottomley; linux-scsi@vger.kernel.org; >>> Sathya >>> Prakash Veerichetty; PDL-MPT-FUSIONLINUX; Sreekanth Reddy >>> Subject: Re: [PATCH 00/10] mpt3sas: full mq support >>> >>> On 01/31/2017 06:54 PM, Kashyap Desai wrote: >>>>> >>>>> -Original Message- >>>>> From: Hannes Reinecke [mailto:h...@suse.de] >>>>> Sent: Tuesday, January 31, 2017 4:47 PM >>>>> To: Christoph Hellwig >>>>> Cc: Martin K. Petersen; James Bottomley; linux-scsi@vger.kernel.org; >>>> >>>> Sathya >>>>> >>>>> Prakash; Kashyap Desai; mpt-fusionlinux@broadcom.com >>>>> Subject: Re: [PATCH 00/10] mpt3sas: full mq support >>>>> >>>>> On 01/31/2017 11:02 AM, Christoph Hellwig wrote: >>>>>> >>>>>> On Tue, Jan 31, 2017 at 10:25:50AM +0100, Hannes Reinecke wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> this is a patchset to enable full multiqueue support for the >>>>>>> mpt3sas >>>>> >>>>> driver. >>>>>>> >>>>>>> While the HBA only has a single mailbox register for submitting >>>>>>> commands, it does have individual receive queues per MSI-X >>>>>>> interrupt and as such does benefit from converting it to full >>>>>>> multiqueue >>>> >>>> support. >>>>>> >>>>>> >>>>>> Explanation and numbers on why this would be beneficial, please. >>>>>> We should not need multiple submissions queues for a single register >>>>>> to benefit from multiple completion queues. >>>>>> >>>>> Well, the actual throughput very strongly depends on the blk-mq-sched >>>>> patches from Jens. >>>>> As this is barely finished I didn't post any numbers yet. >>>>> >>>>> However: >>>>> With multiqueue support: >>>>> 4k seq read : io=60573MB, bw=1009.2MB/s, iops=258353, runt= >>> >>> 60021msec >>>>> >>>>> With scsi-mq on 1 queue: >>>>> 4k seq read : io=17369MB, bw=296291KB/s, iops=74072, runt= 60028msec >>>>> So yes, there _is_ a benefit. Hannes, I have created a md raid0 with 4 SAS SSD drives using below command, #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh /dev/sdi /dev/sdj And here is 'mdadm --detail /dev/md0' command output, -- /dev/md0: Version : 1.2 Creation Time : Thu Feb 9 14:38:47 2017 Raid Level : raid0 Array Size : 780918784 (744.74 GiB 799.66 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Thu Feb 9 14:38:47 2017 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Chunk Size : 512K Name : host_name UUID : b63f9da7:b7de9a25:6a46ca00:42214e22 Events : 0 Number Major Minor RaidDevice State 0 8 960 active sync /dev/sdg 1 8 1121 active sync /dev/sdh 2 8 1442 active sync /dev/sdj 3 8 1283 active sync /dev/sdi -- Then I have used below fio profile to run 4K sequence read operations with nr_hw_queues=1 driver and with nr_hw_queues=24 driver (as my system has two numa node and each with 12 cpus). - global] ioengine=libaio group_reporting direct=1 rw=read bs=4k allow_mounted_write=0 iodepth=128 runtime=150s [job1] filename=/dev/md0 - Here are the fio results when nr_hw_queues=1 (i.e. single request queue) with various number of job counts 1JOB 4k read : io=213268MB, bw=1421.8MB/s, iops=363975, runt=150001msec 2JOBs 4k read : io=309605MB, bw=2064.2MB/s, iops=528389, runt=150001msec 4JOBs 4k
Re: [PATCH 00/10] mpt3sas: full mq support
On 02/07/2017 04:40 PM, Christoph Hellwig wrote: > On Tue, Feb 07, 2017 at 04:39:01PM +0100, Hannes Reinecke wrote: >> But we do; we're getting the index/tag/smid from the high-priority list, >> which is separated from the normal SCSI I/O tag space. >> (which reminds me; there's another cleanup patch to be had in >> _ctl_do_mpt_command(), but that's beside the point). > > The calls to blk_mq_tagset_busy_iter added in patch 8 indicate the > contrary. > Right. Now I see what you mean. We should have used reserved_tags here. Sadly we still don't have an interface on actually _allocate_ reserved tags, have we? Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH 00/10] mpt3sas: full mq support
On Tue, Feb 07, 2017 at 04:39:01PM +0100, Hannes Reinecke wrote: > But we do; we're getting the index/tag/smid from the high-priority list, > which is separated from the normal SCSI I/O tag space. > (which reminds me; there's another cleanup patch to be had in > _ctl_do_mpt_command(), but that's beside the point). The calls to blk_mq_tagset_busy_iter added in patch 8 indicate the contrary.
Re: [PATCH 00/10] mpt3sas: full mq support
On 02/07/2017 04:34 PM, Christoph Hellwig wrote: > On Tue, Feb 07, 2017 at 03:38:51PM +0100, Hannes Reinecke wrote: >> The SCSI passthrough commands pass in pre-formatted SGLs, so the driver >> just has to map them. >> If we were converting that we first have to re-format the >> (driver-specific) SGLs into linux sg lists, only to have them converted >> back into driver-specific ones once queuecommand is called. >> You sure it's worth the effort? >> >> The driver already reserves some tags for precisely this use-case, so it >> won't conflict with normal I/O operation. >> So where's the problem with that? > > If it was an entirely separate path that would be easy, but it's > not - see all the poking into the tag maps that your patch 8 > includes. If it was just a few tags on the side not interacting > with the scsi or blk-mq it wouldn't be such a problem. > But we do; we're getting the index/tag/smid from the high-priority list, which is separated from the normal SCSI I/O tag space. (which reminds me; there's another cleanup patch to be had in _ctl_do_mpt_command(), but that's beside the point). Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH 00/10] mpt3sas: full mq support
On Tue, Feb 07, 2017 at 03:38:51PM +0100, Hannes Reinecke wrote: > The SCSI passthrough commands pass in pre-formatted SGLs, so the driver > just has to map them. > If we were converting that we first have to re-format the > (driver-specific) SGLs into linux sg lists, only to have them converted > back into driver-specific ones once queuecommand is called. > You sure it's worth the effort? > > The driver already reserves some tags for precisely this use-case, so it > won't conflict with normal I/O operation. > So where's the problem with that? If it was an entirely separate path that would be easy, but it's not - see all the poking into the tag maps that your patch 8 includes. If it was just a few tags on the side not interacting with the scsi or blk-mq it wouldn't be such a problem.
Re: [PATCH 00/10] mpt3sas: full mq support
On 02/07/2017 02:19 PM, Christoph Hellwig wrote: > Patch 1-7 look fine to me with minor fixups, and I'd love to see > them go into 4.11. The last one looks really questionable, > and 8 and 9 will need some work so that the MPT passthrough ioctls > either go away or make use of struct request and the block layer > and SCSI infrastructure. > Hmm. Which is quite a bit of effort for very little gain. The SCSI passthrough commands pass in pre-formatted SGLs, so the driver just has to map them. If we were converting that we first have to re-format the (driver-specific) SGLs into linux sg lists, only to have them converted back into driver-specific ones once queuecommand is called. You sure it's worth the effort? The driver already reserves some tags for precisely this use-case, so it won't conflict with normal I/O operation. So where's the problem with that? I know the SCSI passthrough operations are decidedly ugly, but if I were to change them I'd rather move them over to bsg once we converted bsg to operate without a request queue. But for now ... not sure. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
Re: [PATCH 00/10] mpt3sas: full mq support
Patch 1-7 look fine to me with minor fixups, and I'd love to see them go into 4.11. The last one looks really questionable, and 8 and 9 will need some work so that the MPT passthrough ioctls either go away or make use of struct request and the block layer and SCSI infrastructure.
Re: [PATCH 00/10] mpt3sas: full mq support
On 01/31/2017 06:54 PM, Kashyap Desai wrote: -Original Message- From: Hannes Reinecke [mailto:h...@suse.de] Sent: Tuesday, January 31, 2017 4:47 PM To: Christoph Hellwig Cc: Martin K. Petersen; James Bottomley; linux-scsi@vger.kernel.org; Sathya Prakash; Kashyap Desai; mpt-fusionlinux@broadcom.com Subject: Re: [PATCH 00/10] mpt3sas: full mq support On 01/31/2017 11:02 AM, Christoph Hellwig wrote: On Tue, Jan 31, 2017 at 10:25:50AM +0100, Hannes Reinecke wrote: Hi all, this is a patchset to enable full multiqueue support for the mpt3sas driver. While the HBA only has a single mailbox register for submitting commands, it does have individual receive queues per MSI-X interrupt and as such does benefit from converting it to full multiqueue support. Explanation and numbers on why this would be beneficial, please. We should not need multiple submissions queues for a single register to benefit from multiple completion queues. Well, the actual throughput very strongly depends on the blk-mq-sched patches from Jens. As this is barely finished I didn't post any numbers yet. However: With multiqueue support: 4k seq read : io=60573MB, bw=1009.2MB/s, iops=258353, runt= 60021msec With scsi-mq on 1 queue: 4k seq read : io=17369MB, bw=296291KB/s, iops=74072, runt= 60028msec So yes, there _is_ a benefit. (Which is actually quite cool, as these tests were done on a SAS3 HBA, so we're getting close to the theoretical maximum of 1.2GB/s). (Unlike the single-queue case :-) Hannes - Can you share detail about setup ? How many drives do you have and how is connection (enclosure -> drives. ??) ? To me it looks like current mpt3sas driver might be taking more hit in spinlock operation (penalty on NUMA arch is more compare to single core server) unlike we have in megaraid_sas driver use of shared blk tag. The tests were done with a single LSI SAS3008 connected to a NetApp E-series (2660), using 4 LUNs under MD-RAID0. Megaraid_sas is even worse here; due to the odd nature of the 'fusion' implementation we're ending up having _two_ sets of tags, making it really hard to use scsi-mq here. (Not that I didn't try; but lacking a proper backend it's really hard to evaluate the benefit of those ... spinning HDDs simply don't cut it here) I mean " [PATCH 08/10] mpt3sas: lockless command submission for scsi-mq" patch is improving performance removing spinlock overhead and attempting to get request using blk_tags. Are you seeing performance improvement if you hard code nr_hw_queues = 1 in below code changes part of "[PATCH 10/10] mpt3sas: scsi-mq interrupt steering" No. The numbers posted above are generated with exactly that patch; the first line is running with nr_hw_queues=32 and the second line with nr_hw_queues=1. Curiously, though, patch 8/10 also reduces the 'can_queue' value by dividing it by the number of CPUs (required for blk tag space scaling). If I _increase_ can_queue after setting up the tagspace to the original value performance _drops_ again. Most unexpected; I'll be doing more experimenting there. Full results will be presented at VAULT, btw :-) Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)