Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-16 Thread Jens Axboe
On 01/16/2017 08:16 AM, Jens Axboe wrote:
> On 01/16/2017 08:12 AM, Jens Axboe wrote:
>> On 01/16/2017 01:11 AM, Hannes Reinecke wrote:
>>> On 01/13/2017 05:02 PM, Jens Axboe wrote:
 On 01/13/2017 09:00 AM, Jens Axboe wrote:
> On 01/13/2017 08:59 AM, Hannes Reinecke wrote:
>> On 01/13/2017 04:34 PM, Jens Axboe wrote:
>>> On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
>> [ .. ]
 Ah, indeed.
 There is an ominous udev rule here, trying to switch to 'deadline'.

 # cat 60-ssd-scheduler.rules
 # do not edit this file, it will be overwritten on update

 ACTION!="add", GOTO="ssd_scheduler_end"
 SUBSYSTEM!="block", GOTO="ssd_scheduler_end"

 IMPORT{cmdline}="elevator"
 ENV{elevator}=="*?", GOTO="ssd_scheduler_end"

 KERNEL=="sd*[!0-9]", ATTR{queue/rotational}=="0",
 ATTR{queue/scheduler}="deadline"

 LABEL="ssd_scheduler_end"

 Still shouldn't crash the kernel, though ...
>>>
>>> Of course not, and it's not a given that it does, it could just be
>>> triggering after the device load and failing like expected. But just in
>>> case, can you try and disable that rule and see if it still crashes with
>>> MQ_DEADLINE set as the default?
>>>
>> Yes, it does.
>> Same stacktrace as before.
>
> Alright, that's as expected. I've tried with your rule and making
> everything modular, but it still boots fine for me. Very odd. Can you
> send me your .config? And are all the SCSI disks hanging off ahci? Or
> sdb specifically, is that ahci or something else?

 Also, would be great if you could pull:

 git://git.kernel.dk/linux-block blk-mq-sched

 into current 'master' and see if it still reproduces. I expect that it
 will, but just want to ensure that it's a problem in the current code
 base as well.

>>> Actually, it doesn't. Seems to have resolved itself with the latest drop.
>>>
>>> However, not I've got a lockdep splat:
>>>
>>> Jan 16 09:05:02 lammermuir kernel: [ cut here ]
>>> Jan 16 09:05:02 lammermuir kernel: WARNING: CPU: 29 PID: 5860 at
>>> kernel/locking/lockdep.c:3514 lock_release+0x2a7/0x490
>>> Jan 16 09:05:02 lammermuir kernel: DEBUG_LOCKS_WARN_ON(depth <= 0)
>>> Jan 16 09:05:02 lammermuir kernel: Modules linked in: raid0 mpt3sas
>>> raid_class rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache e
>>> Jan 16 09:05:02 lammermuir kernel:  fb_sys_fops ahci uhci_hcd ttm
>>> ehci_pci libahci ehci_hcd serio_raw crc32c_intel drm libata usbcore hpsa
>>> Jan 16 09:05:02 lammermuir kernel: CPU: 29 PID: 5860 Comm: fio Not
>>> tainted 4.10.0-rc3+ #540
>>> Jan 16 09:05:02 lammermuir kernel: Hardware name: HP ProLiant ML350p
>>> Gen8, BIOS P72 09/08/2013
>>> Jan 16 09:05:02 lammermuir kernel: Call Trace:
>>> Jan 16 09:05:02 lammermuir kernel:  dump_stack+0x85/0xc9
>>> Jan 16 09:05:02 lammermuir kernel:  __warn+0xd1/0xf0
>>> Jan 16 09:05:02 lammermuir kernel:  ? aio_write+0x118/0x170
>>> Jan 16 09:05:02 lammermuir kernel:  warn_slowpath_fmt+0x4f/0x60
>>> Jan 16 09:05:02 lammermuir kernel:  lock_release+0x2a7/0x490
>>> Jan 16 09:05:02 lammermuir kernel:  ? blkdev_write_iter+0x89/0xd0
>>> Jan 16 09:05:02 lammermuir kernel:  aio_write+0x138/0x170
>>> Jan 16 09:05:02 lammermuir kernel:  do_io_submit+0x4d2/0x8f0
>>> Jan 16 09:05:02 lammermuir kernel:  ? do_io_submit+0x413/0x8f0
>>> Jan 16 09:05:02 lammermuir kernel:  SyS_io_submit+0x10/0x20
>>> Jan 16 09:05:02 lammermuir kernel:  entry_SYSCALL_64_fastpath+0x23/0xc6
>>
>> Odd, not sure that's me. What did you pull my branch into? And what is the
>> sha of the stuff you pulled in?
> 
> Forgot to ask, please send me the fio job you ran here.

Nevermind, it's a mainline bug that's fixed in -rc4:

commit a12f1ae61c489076a9aeb90bddca7722bf330df3
Author: Shaohua Li 
Date:   Tue Dec 13 12:09:56 2016 -0800

aio: fix lock dep warning

-- 
Jens Axboe



Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-16 Thread Jens Axboe
On 01/16/2017 08:12 AM, Jens Axboe wrote:
> On 01/16/2017 01:11 AM, Hannes Reinecke wrote:
>> On 01/13/2017 05:02 PM, Jens Axboe wrote:
>>> On 01/13/2017 09:00 AM, Jens Axboe wrote:
 On 01/13/2017 08:59 AM, Hannes Reinecke wrote:
> On 01/13/2017 04:34 PM, Jens Axboe wrote:
>> On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
> [ .. ]
>>> Ah, indeed.
>>> There is an ominous udev rule here, trying to switch to 'deadline'.
>>>
>>> # cat 60-ssd-scheduler.rules
>>> # do not edit this file, it will be overwritten on update
>>>
>>> ACTION!="add", GOTO="ssd_scheduler_end"
>>> SUBSYSTEM!="block", GOTO="ssd_scheduler_end"
>>>
>>> IMPORT{cmdline}="elevator"
>>> ENV{elevator}=="*?", GOTO="ssd_scheduler_end"
>>>
>>> KERNEL=="sd*[!0-9]", ATTR{queue/rotational}=="0",
>>> ATTR{queue/scheduler}="deadline"
>>>
>>> LABEL="ssd_scheduler_end"
>>>
>>> Still shouldn't crash the kernel, though ...
>>
>> Of course not, and it's not a given that it does, it could just be
>> triggering after the device load and failing like expected. But just in
>> case, can you try and disable that rule and see if it still crashes with
>> MQ_DEADLINE set as the default?
>>
> Yes, it does.
> Same stacktrace as before.

 Alright, that's as expected. I've tried with your rule and making
 everything modular, but it still boots fine for me. Very odd. Can you
 send me your .config? And are all the SCSI disks hanging off ahci? Or
 sdb specifically, is that ahci or something else?
>>>
>>> Also, would be great if you could pull:
>>>
>>> git://git.kernel.dk/linux-block blk-mq-sched
>>>
>>> into current 'master' and see if it still reproduces. I expect that it
>>> will, but just want to ensure that it's a problem in the current code
>>> base as well.
>>>
>> Actually, it doesn't. Seems to have resolved itself with the latest drop.
>>
>> However, not I've got a lockdep splat:
>>
>> Jan 16 09:05:02 lammermuir kernel: [ cut here ]
>> Jan 16 09:05:02 lammermuir kernel: WARNING: CPU: 29 PID: 5860 at
>> kernel/locking/lockdep.c:3514 lock_release+0x2a7/0x490
>> Jan 16 09:05:02 lammermuir kernel: DEBUG_LOCKS_WARN_ON(depth <= 0)
>> Jan 16 09:05:02 lammermuir kernel: Modules linked in: raid0 mpt3sas
>> raid_class rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache e
>> Jan 16 09:05:02 lammermuir kernel:  fb_sys_fops ahci uhci_hcd ttm
>> ehci_pci libahci ehci_hcd serio_raw crc32c_intel drm libata usbcore hpsa
>> Jan 16 09:05:02 lammermuir kernel: CPU: 29 PID: 5860 Comm: fio Not
>> tainted 4.10.0-rc3+ #540
>> Jan 16 09:05:02 lammermuir kernel: Hardware name: HP ProLiant ML350p
>> Gen8, BIOS P72 09/08/2013
>> Jan 16 09:05:02 lammermuir kernel: Call Trace:
>> Jan 16 09:05:02 lammermuir kernel:  dump_stack+0x85/0xc9
>> Jan 16 09:05:02 lammermuir kernel:  __warn+0xd1/0xf0
>> Jan 16 09:05:02 lammermuir kernel:  ? aio_write+0x118/0x170
>> Jan 16 09:05:02 lammermuir kernel:  warn_slowpath_fmt+0x4f/0x60
>> Jan 16 09:05:02 lammermuir kernel:  lock_release+0x2a7/0x490
>> Jan 16 09:05:02 lammermuir kernel:  ? blkdev_write_iter+0x89/0xd0
>> Jan 16 09:05:02 lammermuir kernel:  aio_write+0x138/0x170
>> Jan 16 09:05:02 lammermuir kernel:  do_io_submit+0x4d2/0x8f0
>> Jan 16 09:05:02 lammermuir kernel:  ? do_io_submit+0x413/0x8f0
>> Jan 16 09:05:02 lammermuir kernel:  SyS_io_submit+0x10/0x20
>> Jan 16 09:05:02 lammermuir kernel:  entry_SYSCALL_64_fastpath+0x23/0xc6
> 
> Odd, not sure that's me. What did you pull my branch into? And what is the
> sha of the stuff you pulled in?

Forgot to ask, please send me the fio job you ran here.

-- 
Jens Axboe



Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-16 Thread Jens Axboe
On 01/16/2017 01:11 AM, Hannes Reinecke wrote:
> On 01/13/2017 05:02 PM, Jens Axboe wrote:
>> On 01/13/2017 09:00 AM, Jens Axboe wrote:
>>> On 01/13/2017 08:59 AM, Hannes Reinecke wrote:
 On 01/13/2017 04:34 PM, Jens Axboe wrote:
> On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
 [ .. ]
>> Ah, indeed.
>> There is an ominous udev rule here, trying to switch to 'deadline'.
>>
>> # cat 60-ssd-scheduler.rules
>> # do not edit this file, it will be overwritten on update
>>
>> ACTION!="add", GOTO="ssd_scheduler_end"
>> SUBSYSTEM!="block", GOTO="ssd_scheduler_end"
>>
>> IMPORT{cmdline}="elevator"
>> ENV{elevator}=="*?", GOTO="ssd_scheduler_end"
>>
>> KERNEL=="sd*[!0-9]", ATTR{queue/rotational}=="0",
>> ATTR{queue/scheduler}="deadline"
>>
>> LABEL="ssd_scheduler_end"
>>
>> Still shouldn't crash the kernel, though ...
>
> Of course not, and it's not a given that it does, it could just be
> triggering after the device load and failing like expected. But just in
> case, can you try and disable that rule and see if it still crashes with
> MQ_DEADLINE set as the default?
>
 Yes, it does.
 Same stacktrace as before.
>>>
>>> Alright, that's as expected. I've tried with your rule and making
>>> everything modular, but it still boots fine for me. Very odd. Can you
>>> send me your .config? And are all the SCSI disks hanging off ahci? Or
>>> sdb specifically, is that ahci or something else?
>>
>> Also, would be great if you could pull:
>>
>> git://git.kernel.dk/linux-block blk-mq-sched
>>
>> into current 'master' and see if it still reproduces. I expect that it
>> will, but just want to ensure that it's a problem in the current code
>> base as well.
>>
> Actually, it doesn't. Seems to have resolved itself with the latest drop.
> 
> However, not I've got a lockdep splat:
> 
> Jan 16 09:05:02 lammermuir kernel: [ cut here ]
> Jan 16 09:05:02 lammermuir kernel: WARNING: CPU: 29 PID: 5860 at
> kernel/locking/lockdep.c:3514 lock_release+0x2a7/0x490
> Jan 16 09:05:02 lammermuir kernel: DEBUG_LOCKS_WARN_ON(depth <= 0)
> Jan 16 09:05:02 lammermuir kernel: Modules linked in: raid0 mpt3sas
> raid_class rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache e
> Jan 16 09:05:02 lammermuir kernel:  fb_sys_fops ahci uhci_hcd ttm
> ehci_pci libahci ehci_hcd serio_raw crc32c_intel drm libata usbcore hpsa
> Jan 16 09:05:02 lammermuir kernel: CPU: 29 PID: 5860 Comm: fio Not
> tainted 4.10.0-rc3+ #540
> Jan 16 09:05:02 lammermuir kernel: Hardware name: HP ProLiant ML350p
> Gen8, BIOS P72 09/08/2013
> Jan 16 09:05:02 lammermuir kernel: Call Trace:
> Jan 16 09:05:02 lammermuir kernel:  dump_stack+0x85/0xc9
> Jan 16 09:05:02 lammermuir kernel:  __warn+0xd1/0xf0
> Jan 16 09:05:02 lammermuir kernel:  ? aio_write+0x118/0x170
> Jan 16 09:05:02 lammermuir kernel:  warn_slowpath_fmt+0x4f/0x60
> Jan 16 09:05:02 lammermuir kernel:  lock_release+0x2a7/0x490
> Jan 16 09:05:02 lammermuir kernel:  ? blkdev_write_iter+0x89/0xd0
> Jan 16 09:05:02 lammermuir kernel:  aio_write+0x138/0x170
> Jan 16 09:05:02 lammermuir kernel:  do_io_submit+0x4d2/0x8f0
> Jan 16 09:05:02 lammermuir kernel:  ? do_io_submit+0x413/0x8f0
> Jan 16 09:05:02 lammermuir kernel:  SyS_io_submit+0x10/0x20
> Jan 16 09:05:02 lammermuir kernel:  entry_SYSCALL_64_fastpath+0x23/0xc6

Odd, not sure that's me. What did you pull my branch into? And what is the
sha of the stuff you pulled in?

-- 
Jens Axboe



Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-16 Thread Hannes Reinecke
On 01/13/2017 05:02 PM, Jens Axboe wrote:
> On 01/13/2017 09:00 AM, Jens Axboe wrote:
>> On 01/13/2017 08:59 AM, Hannes Reinecke wrote:
>>> On 01/13/2017 04:34 PM, Jens Axboe wrote:
 On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
>>> [ .. ]
> Ah, indeed.
> There is an ominous udev rule here, trying to switch to 'deadline'.
>
> # cat 60-ssd-scheduler.rules
> # do not edit this file, it will be overwritten on update
>
> ACTION!="add", GOTO="ssd_scheduler_end"
> SUBSYSTEM!="block", GOTO="ssd_scheduler_end"
>
> IMPORT{cmdline}="elevator"
> ENV{elevator}=="*?", GOTO="ssd_scheduler_end"
>
> KERNEL=="sd*[!0-9]", ATTR{queue/rotational}=="0",
> ATTR{queue/scheduler}="deadline"
>
> LABEL="ssd_scheduler_end"
>
> Still shouldn't crash the kernel, though ...

 Of course not, and it's not a given that it does, it could just be
 triggering after the device load and failing like expected. But just in
 case, can you try and disable that rule and see if it still crashes with
 MQ_DEADLINE set as the default?

>>> Yes, it does.
>>> Same stacktrace as before.
>>
>> Alright, that's as expected. I've tried with your rule and making
>> everything modular, but it still boots fine for me. Very odd. Can you
>> send me your .config? And are all the SCSI disks hanging off ahci? Or
>> sdb specifically, is that ahci or something else?
> 
> Also, would be great if you could pull:
> 
> git://git.kernel.dk/linux-block blk-mq-sched
> 
> into current 'master' and see if it still reproduces. I expect that it
> will, but just want to ensure that it's a problem in the current code
> base as well.
> 
Actually, it doesn't. Seems to have resolved itself with the latest drop.

However, not I've got a lockdep splat:

Jan 16 09:05:02 lammermuir kernel: [ cut here ]
Jan 16 09:05:02 lammermuir kernel: WARNING: CPU: 29 PID: 5860 at
kernel/locking/lockdep.c:3514 lock_release+0x2a7/0x490
Jan 16 09:05:02 lammermuir kernel: DEBUG_LOCKS_WARN_ON(depth <= 0)
Jan 16 09:05:02 lammermuir kernel: Modules linked in: raid0 mpt3sas
raid_class rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache e
Jan 16 09:05:02 lammermuir kernel:  fb_sys_fops ahci uhci_hcd ttm
ehci_pci libahci ehci_hcd serio_raw crc32c_intel drm libata usbcore hpsa
Jan 16 09:05:02 lammermuir kernel: CPU: 29 PID: 5860 Comm: fio Not
tainted 4.10.0-rc3+ #540
Jan 16 09:05:02 lammermuir kernel: Hardware name: HP ProLiant ML350p
Gen8, BIOS P72 09/08/2013
Jan 16 09:05:02 lammermuir kernel: Call Trace:
Jan 16 09:05:02 lammermuir kernel:  dump_stack+0x85/0xc9
Jan 16 09:05:02 lammermuir kernel:  __warn+0xd1/0xf0
Jan 16 09:05:02 lammermuir kernel:  ? aio_write+0x118/0x170
Jan 16 09:05:02 lammermuir kernel:  warn_slowpath_fmt+0x4f/0x60
Jan 16 09:05:02 lammermuir kernel:  lock_release+0x2a7/0x490
Jan 16 09:05:02 lammermuir kernel:  ? blkdev_write_iter+0x89/0xd0
Jan 16 09:05:02 lammermuir kernel:  aio_write+0x138/0x170
Jan 16 09:05:02 lammermuir kernel:  do_io_submit+0x4d2/0x8f0
Jan 16 09:05:02 lammermuir kernel:  ? do_io_submit+0x413/0x8f0
Jan 16 09:05:02 lammermuir kernel:  SyS_io_submit+0x10/0x20
Jan 16 09:05:02 lammermuir kernel:  entry_SYSCALL_64_fastpath+0x23/0xc6

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-15 Thread Jens Axboe
On 01/15/2017 03:12 AM, Paolo Valente wrote:
> 
>> Il giorno 11 gen 2017, alle ore 22:39, Jens Axboe  ha scritto:
>>
>> Another year, another posting of this patchset. The previous posting
>> was here:
>>
>> https://www.spinics.net/lists/kernel/msg2406106.html
>>
>> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
>>
>> I've reworked bits of this to get rid of the shadow requests, thanks
>> to Bart for the inspiration. The missing piece, for me, was the fact
>> that we have the tags->rqs[] indirection array already. I've done this
>> somewhat differently, though, by having the internal scheduler tag
>> map be allocated/torn down when an IO scheduler is attached or
>> detached. This also means that when we run without a scheduler, we
>> don't have to do double tag allocations, it'll work like before.
>>
>> The patchset applies on top of 4.10-rc3, or can be pulled here:
>>
>> git://git.kernel.dk/linux-block blk-mq-sched.6
>>
>>
> 
> Hi Jens,
> I have checked this new version to find solutions to the apparent
> errors, mistakes or just unclear parts (to me) that I have pointed out
> before Christmas last year.  But I have found no changes related to
> these problems.
> 
> As I have already written, I'm willing to try to fix those errors
> myself, if they really are errors, but I would first need at least
> some minimal initial feedback and guidance.  If needed, tell me how I
> can help you get in sync again with these issues (sending my reports
> again, sending a digest of them, ...).

Sorry Paolo, but focus has been on getting the framework in both
a mergeable and stable state, which it is now. I'll tend to BFQ
specific issues next week, so we can get those resolved as well.

Do you have a place where you have posted your in-progress
conversion?

-- 
Jens Axboe



Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-15 Thread Paolo Valente

> Il giorno 11 gen 2017, alle ore 22:39, Jens Axboe  ha scritto:
> 
> Another year, another posting of this patchset. The previous posting
> was here:
> 
> https://www.spinics.net/lists/kernel/msg2406106.html
> 
> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
> 
> I've reworked bits of this to get rid of the shadow requests, thanks
> to Bart for the inspiration. The missing piece, for me, was the fact
> that we have the tags->rqs[] indirection array already. I've done this
> somewhat differently, though, by having the internal scheduler tag
> map be allocated/torn down when an IO scheduler is attached or
> detached. This also means that when we run without a scheduler, we
> don't have to do double tag allocations, it'll work like before.
> 
> The patchset applies on top of 4.10-rc3, or can be pulled here:
> 
> git://git.kernel.dk/linux-block blk-mq-sched.6
> 
> 

Hi Jens,
I have checked this new version to find solutions to the apparent
errors, mistakes or just unclear parts (to me) that I have pointed out
before Christmas last year.  But I have found no changes related to
these problems.

As I have already written, I'm willing to try to fix those errors
myself, if they really are errors, but I would first need at least
some minimal initial feedback and guidance.  If needed, tell me how I
can help you get in sync again with these issues (sending my reports
again, sending a digest of them, ...).

Thanks,
Paolo

> block/Kconfig.iosched|   50 
> block/Makefile   |3 
> block/blk-core.c |   19 -
> block/blk-exec.c |3 
> block/blk-flush.c|   15 -
> block/blk-ioc.c  |   12 
> block/blk-merge.c|4 
> block/blk-mq-sched.c |  354 +
> block/blk-mq-sched.h |  157 
> block/blk-mq-sysfs.c |   13 +
> block/blk-mq-tag.c   |   58 ++--
> block/blk-mq-tag.h   |4 
> block/blk-mq.c   |  413 +++---
> block/blk-mq.h   |   40 +++
> block/blk-tag.c  |1 
> block/blk.h  |   26 +-
> block/cfq-iosched.c  |2 
> block/deadline-iosched.c |2 
> block/elevator.c |  247 +++-
> block/mq-deadline.c  |  569 
> +++
> block/noop-iosched.c |2 
> drivers/nvme/host/pci.c  |1 
> include/linux/blk-mq.h   |9 
> include/linux/blkdev.h   |6 
> include/linux/elevator.h |   36 ++
> 25 files changed, 1732 insertions(+), 314 deletions(-)
> 
> -- 
> Jens Axboe
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Jens Axboe
On 01/13/2017 09:02 AM, Jens Axboe wrote:
> Also, would be great if you could pull:
> 
> git://git.kernel.dk/linux-block blk-mq-sched
> 
> into current 'master' and see if it still reproduces. I expect that it
> will, but just want to ensure that it's a problem in the current code
> base as well.

Hannes, can you try the current branch? I believe your problem should be
fixed now, would be great if you could verify.

-- 
Jens Axboe



Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Jens Axboe
On 01/13/2017 09:00 AM, Jens Axboe wrote:
> On 01/13/2017 08:59 AM, Hannes Reinecke wrote:
>> On 01/13/2017 04:34 PM, Jens Axboe wrote:
>>> On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
>> [ .. ]
 Ah, indeed.
 There is an ominous udev rule here, trying to switch to 'deadline'.

 # cat 60-ssd-scheduler.rules
 # do not edit this file, it will be overwritten on update

 ACTION!="add", GOTO="ssd_scheduler_end"
 SUBSYSTEM!="block", GOTO="ssd_scheduler_end"

 IMPORT{cmdline}="elevator"
 ENV{elevator}=="*?", GOTO="ssd_scheduler_end"

 KERNEL=="sd*[!0-9]", ATTR{queue/rotational}=="0",
 ATTR{queue/scheduler}="deadline"

 LABEL="ssd_scheduler_end"

 Still shouldn't crash the kernel, though ...
>>>
>>> Of course not, and it's not a given that it does, it could just be
>>> triggering after the device load and failing like expected. But just in
>>> case, can you try and disable that rule and see if it still crashes with
>>> MQ_DEADLINE set as the default?
>>>
>> Yes, it does.
>> Same stacktrace as before.
> 
> Alright, that's as expected. I've tried with your rule and making
> everything modular, but it still boots fine for me. Very odd. Can you
> send me your .config? And are all the SCSI disks hanging off ahci? Or
> sdb specifically, is that ahci or something else?

Also, would be great if you could pull:

git://git.kernel.dk/linux-block blk-mq-sched

into current 'master' and see if it still reproduces. I expect that it
will, but just want to ensure that it's a problem in the current code
base as well.

-- 
Jens Axboe



Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Jens Axboe
On 01/13/2017 08:59 AM, Hannes Reinecke wrote:
> On 01/13/2017 04:34 PM, Jens Axboe wrote:
>> On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
> [ .. ]
>>> Ah, indeed.
>>> There is an ominous udev rule here, trying to switch to 'deadline'.
>>>
>>> # cat 60-ssd-scheduler.rules
>>> # do not edit this file, it will be overwritten on update
>>>
>>> ACTION!="add", GOTO="ssd_scheduler_end"
>>> SUBSYSTEM!="block", GOTO="ssd_scheduler_end"
>>>
>>> IMPORT{cmdline}="elevator"
>>> ENV{elevator}=="*?", GOTO="ssd_scheduler_end"
>>>
>>> KERNEL=="sd*[!0-9]", ATTR{queue/rotational}=="0",
>>> ATTR{queue/scheduler}="deadline"
>>>
>>> LABEL="ssd_scheduler_end"
>>>
>>> Still shouldn't crash the kernel, though ...
>>
>> Of course not, and it's not a given that it does, it could just be
>> triggering after the device load and failing like expected. But just in
>> case, can you try and disable that rule and see if it still crashes with
>> MQ_DEADLINE set as the default?
>>
> Yes, it does.
> Same stacktrace as before.

Alright, that's as expected. I've tried with your rule and making
everything modular, but it still boots fine for me. Very odd. Can you
send me your .config? And are all the SCSI disks hanging off ahci? Or
sdb specifically, is that ahci or something else?

-- 
Jens Axboe



Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/13/2017 04:34 PM, Jens Axboe wrote:
> On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
[ .. ]
>> Ah, indeed.
>> There is an ominous udev rule here, trying to switch to 'deadline'.
>>
>> # cat 60-ssd-scheduler.rules
>> # do not edit this file, it will be overwritten on update
>>
>> ACTION!="add", GOTO="ssd_scheduler_end"
>> SUBSYSTEM!="block", GOTO="ssd_scheduler_end"
>>
>> IMPORT{cmdline}="elevator"
>> ENV{elevator}=="*?", GOTO="ssd_scheduler_end"
>>
>> KERNEL=="sd*[!0-9]", ATTR{queue/rotational}=="0",
>> ATTR{queue/scheduler}="deadline"
>>
>> LABEL="ssd_scheduler_end"
>>
>> Still shouldn't crash the kernel, though ...
> 
> Of course not, and it's not a given that it does, it could just be
> triggering after the device load and failing like expected. But just in
> case, can you try and disable that rule and see if it still crashes with
> MQ_DEADLINE set as the default?
> 
Yes, it does.
Same stacktrace as before.

Cheers

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Jens Axboe
On 01/13/2017 08:33 AM, Hannes Reinecke wrote:
> On 01/13/2017 04:23 PM, Jens Axboe wrote:
>> On 01/13/2017 04:04 AM, Hannes Reinecke wrote:
>>> On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
 On 01/11/2017 10:39 PM, Jens Axboe wrote:
> Another year, another posting of this patchset. The previous posting
> was here:
>
> https://www.spinics.net/lists/kernel/msg2406106.html
>
> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
>
> I've reworked bits of this to get rid of the shadow requests, thanks
> to Bart for the inspiration. The missing piece, for me, was the fact
> that we have the tags->rqs[] indirection array already. I've done this
> somewhat differently, though, by having the internal scheduler tag
> map be allocated/torn down when an IO scheduler is attached or
> detached. This also means that when we run without a scheduler, we
> don't have to do double tag allocations, it'll work like before.
>
> The patchset applies on top of 4.10-rc3, or can be pulled here:
>
> git://git.kernel.dk/linux-block blk-mq-sched.6
>
 Well ... something's wrong here on my machine:

 [   39.886886] [ cut here ]
 [   39.886895] WARNING: CPU: 9 PID: 62 at block/blk-mq.c:342
 __blk_mq_finish_request+0x124/0x140
 [   39.886895] Modules linked in: sd_mod ahci uhci_hcd ehci_pci
 mpt3sas(+) libahci ehci_hcd serio_raw crc32c_intel raid_class drm libata
 usbcore hpsa usb_common scsi_transport_sas sg dm_multipath dm_mod
 scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
 [   39.886910] CPU: 9 PID: 62 Comm: kworker/u130:0 Not tainted
 4.10.0-rc3+ #528
 [   39.886911] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
 [   39.886917] Workqueue: events_unbound async_run_entry_fn
 [   39.886918] Call Trace:
 [   39.886923]  dump_stack+0x85/0xc9
 [   39.886927]  __warn+0xd1/0xf0
 [   39.886928]  warn_slowpath_null+0x1d/0x20
 [   39.886930]  __blk_mq_finish_request+0x124/0x140
 [   39.886932]  blk_mq_finish_request+0x55/0x60
 [   39.886934]  blk_mq_sched_put_request+0x78/0x80
 [   39.886936]  blk_mq_free_request+0xe/0x10
 [   39.886938]  blk_put_request+0x25/0x60
 [   39.886944]  __scsi_execute.isra.24+0x104/0x160
 [   39.886946]  scsi_execute_req_flags+0x94/0x100
 [   39.886948]  scsi_report_opcode+0xab/0x100

 checking ...

>>> Ah.
>>> Seems like the elevator switch races with device setup:
>>>
>>>  1188.490326] [ cut here ]
>>> [ 1188.490334] WARNING: CPU: 9 PID: 30155 at block/blk-mq.c:342
>>> __blk_mq_finish_request+0x172/0x180
>>> [ 1188.490335] Modules linked in: mpt3sas(+) raid_class rpcsec_gss_krb5
>>> auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filt
>>> er ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables
>>> af_packet br_netfilter bridge stp llc iscsi_ibft iscs
>>> i_boot_sysfs sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
>>> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_p
>>> clmul tg3 ixgbe ghash_clmulni_intel pcbc ptp aesni_intel pps_core
>>> aes_x86_64 ipmi_ssif hpilo hpwdt mdio libphy pcc_cpufreq cry
>>> pto_simd glue_helper iTCO_wdt iTCO_vendor_support acpi_cpufreq tpm_tis
>>> ipmi_si ipmi_devintf cryptd lpc_ich pcspkr ioatdma tpm_
>>> tis_core thermal wmi shpchp dca ipmi_msghandler tpm fjes button sunrpc
>>> btrfs xor sr_mod raid6_pq cdrom ehci_pci mgag200 i2c_al
>>> go_bit drm_kms_helper syscopyarea sysfillrect uhci_hcd
>>> [ 1188.490399]  sysimgblt fb_sys_fops sd_mod ahci ehci_hcd ttm libahci
>>> crc32c_intel serio_raw drm libata usbcore usb_common hp
>>> sa scsi_transport_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc
>>> scsi_dh_alua autofs4
>>> [ 1188.490411] CPU: 9 PID: 30155 Comm: kworker/u130:6 Not tainted
>>> 4.10.0-rc3+ #535
>>> [ 1188.490411] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
>>> [ 1188.490425] Workqueue: events_unbound async_run_entry_fn
>>> [ 1188.490427] Call Trace:
>>> [ 1188.490433]  dump_stack+0x85/0xc9
>>> [ 1188.490436]  __warn+0xd1/0xf0
>>> [ 1188.490438]  warn_slowpath_null+0x1d/0x20
>>> [ 1188.490440]  __blk_mq_finish_request+0x172/0x180
>>> [ 1188.490442]  blk_mq_finish_request+0x55/0x60
>>> [ 1188.490443]  blk_mq_sched_put_request+0x78/0x80
>>> [ 1188.490445]  blk_mq_free_request+0xe/0x10
>>> [ 1188.490448]  blk_put_request+0x25/0x60
>>> [ 1188.490453]  __scsi_execute.isra.24+0x104/0x160
>>> [ 1188.490455]  scsi_execute_req_flags+0x94/0x100
>>> [ 1188.490457]  scsi_report_opcode+0xab/0x100
>>> [ 1188.490461]  sd_revalidate_disk+0xaef/0x1450 [sd_mod]
>>> [ 1188.490464]  sd_probe_async+0xd1/0x1d0 [sd_mod]
>>> [ 1188.490466]  async_run_entry_fn+0x37/0x150
>>> [ 1188.490470]  process_one_work+0x1d0/0x660
>>> [ 1188.490472]  ? process_one_work+0x151/0x660
>>> [ 1188.490474]  worker_thread+0x12b/0x4a0
>>> [ 1188.490475]  kthread+0x10c/0x140
>>> [ 1188.490477]  ? proc

Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/13/2017 04:23 PM, Jens Axboe wrote:
> On 01/13/2017 04:04 AM, Hannes Reinecke wrote:
>> On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
>>> On 01/11/2017 10:39 PM, Jens Axboe wrote:
 Another year, another posting of this patchset. The previous posting
 was here:

 https://www.spinics.net/lists/kernel/msg2406106.html

 (yes, I've skipped v5, it was fixes on top of v4, not the rework).

 I've reworked bits of this to get rid of the shadow requests, thanks
 to Bart for the inspiration. The missing piece, for me, was the fact
 that we have the tags->rqs[] indirection array already. I've done this
 somewhat differently, though, by having the internal scheduler tag
 map be allocated/torn down when an IO scheduler is attached or
 detached. This also means that when we run without a scheduler, we
 don't have to do double tag allocations, it'll work like before.

 The patchset applies on top of 4.10-rc3, or can be pulled here:

 git://git.kernel.dk/linux-block blk-mq-sched.6

>>> Well ... something's wrong here on my machine:
>>>
>>> [   39.886886] [ cut here ]
>>> [   39.886895] WARNING: CPU: 9 PID: 62 at block/blk-mq.c:342
>>> __blk_mq_finish_request+0x124/0x140
>>> [   39.886895] Modules linked in: sd_mod ahci uhci_hcd ehci_pci
>>> mpt3sas(+) libahci ehci_hcd serio_raw crc32c_intel raid_class drm libata
>>> usbcore hpsa usb_common scsi_transport_sas sg dm_multipath dm_mod
>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
>>> [   39.886910] CPU: 9 PID: 62 Comm: kworker/u130:0 Not tainted
>>> 4.10.0-rc3+ #528
>>> [   39.886911] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
>>> [   39.886917] Workqueue: events_unbound async_run_entry_fn
>>> [   39.886918] Call Trace:
>>> [   39.886923]  dump_stack+0x85/0xc9
>>> [   39.886927]  __warn+0xd1/0xf0
>>> [   39.886928]  warn_slowpath_null+0x1d/0x20
>>> [   39.886930]  __blk_mq_finish_request+0x124/0x140
>>> [   39.886932]  blk_mq_finish_request+0x55/0x60
>>> [   39.886934]  blk_mq_sched_put_request+0x78/0x80
>>> [   39.886936]  blk_mq_free_request+0xe/0x10
>>> [   39.886938]  blk_put_request+0x25/0x60
>>> [   39.886944]  __scsi_execute.isra.24+0x104/0x160
>>> [   39.886946]  scsi_execute_req_flags+0x94/0x100
>>> [   39.886948]  scsi_report_opcode+0xab/0x100
>>>
>>> checking ...
>>>
>> Ah.
>> Seems like the elevator switch races with device setup:
>>
>>  1188.490326] [ cut here ]
>> [ 1188.490334] WARNING: CPU: 9 PID: 30155 at block/blk-mq.c:342
>> __blk_mq_finish_request+0x172/0x180
>> [ 1188.490335] Modules linked in: mpt3sas(+) raid_class rpcsec_gss_krb5
>> auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filt
>> er ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables
>> af_packet br_netfilter bridge stp llc iscsi_ibft iscs
>> i_boot_sysfs sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
>> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_p
>> clmul tg3 ixgbe ghash_clmulni_intel pcbc ptp aesni_intel pps_core
>> aes_x86_64 ipmi_ssif hpilo hpwdt mdio libphy pcc_cpufreq cry
>> pto_simd glue_helper iTCO_wdt iTCO_vendor_support acpi_cpufreq tpm_tis
>> ipmi_si ipmi_devintf cryptd lpc_ich pcspkr ioatdma tpm_
>> tis_core thermal wmi shpchp dca ipmi_msghandler tpm fjes button sunrpc
>> btrfs xor sr_mod raid6_pq cdrom ehci_pci mgag200 i2c_al
>> go_bit drm_kms_helper syscopyarea sysfillrect uhci_hcd
>> [ 1188.490399]  sysimgblt fb_sys_fops sd_mod ahci ehci_hcd ttm libahci
>> crc32c_intel serio_raw drm libata usbcore usb_common hp
>> sa scsi_transport_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc
>> scsi_dh_alua autofs4
>> [ 1188.490411] CPU: 9 PID: 30155 Comm: kworker/u130:6 Not tainted
>> 4.10.0-rc3+ #535
>> [ 1188.490411] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
>> [ 1188.490425] Workqueue: events_unbound async_run_entry_fn
>> [ 1188.490427] Call Trace:
>> [ 1188.490433]  dump_stack+0x85/0xc9
>> [ 1188.490436]  __warn+0xd1/0xf0
>> [ 1188.490438]  warn_slowpath_null+0x1d/0x20
>> [ 1188.490440]  __blk_mq_finish_request+0x172/0x180
>> [ 1188.490442]  blk_mq_finish_request+0x55/0x60
>> [ 1188.490443]  blk_mq_sched_put_request+0x78/0x80
>> [ 1188.490445]  blk_mq_free_request+0xe/0x10
>> [ 1188.490448]  blk_put_request+0x25/0x60
>> [ 1188.490453]  __scsi_execute.isra.24+0x104/0x160
>> [ 1188.490455]  scsi_execute_req_flags+0x94/0x100
>> [ 1188.490457]  scsi_report_opcode+0xab/0x100
>> [ 1188.490461]  sd_revalidate_disk+0xaef/0x1450 [sd_mod]
>> [ 1188.490464]  sd_probe_async+0xd1/0x1d0 [sd_mod]
>> [ 1188.490466]  async_run_entry_fn+0x37/0x150
>> [ 1188.490470]  process_one_work+0x1d0/0x660
>> [ 1188.490472]  ? process_one_work+0x151/0x660
>> [ 1188.490474]  worker_thread+0x12b/0x4a0
>> [ 1188.490475]  kthread+0x10c/0x140
>> [ 1188.490477]  ? process_one_work+0x660/0x660
>> [ 1188.490478]  ? kthread_create_on_node+0x40/0x40
>> [ 1188.490483]  ret_from_fork+0x2a/0x40
>> [ 1188.490484] ---[ e

Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Jens Axboe
On 01/13/2017 04:04 AM, Hannes Reinecke wrote:
> On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
>> On 01/11/2017 10:39 PM, Jens Axboe wrote:
>>> Another year, another posting of this patchset. The previous posting
>>> was here:
>>>
>>> https://www.spinics.net/lists/kernel/msg2406106.html
>>>
>>> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
>>>
>>> I've reworked bits of this to get rid of the shadow requests, thanks
>>> to Bart for the inspiration. The missing piece, for me, was the fact
>>> that we have the tags->rqs[] indirection array already. I've done this
>>> somewhat differently, though, by having the internal scheduler tag
>>> map be allocated/torn down when an IO scheduler is attached or
>>> detached. This also means that when we run without a scheduler, we
>>> don't have to do double tag allocations, it'll work like before.
>>>
>>> The patchset applies on top of 4.10-rc3, or can be pulled here:
>>>
>>> git://git.kernel.dk/linux-block blk-mq-sched.6
>>>
>> Well ... something's wrong here on my machine:
>>
>> [   39.886886] [ cut here ]
>> [   39.886895] WARNING: CPU: 9 PID: 62 at block/blk-mq.c:342
>> __blk_mq_finish_request+0x124/0x140
>> [   39.886895] Modules linked in: sd_mod ahci uhci_hcd ehci_pci
>> mpt3sas(+) libahci ehci_hcd serio_raw crc32c_intel raid_class drm libata
>> usbcore hpsa usb_common scsi_transport_sas sg dm_multipath dm_mod
>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
>> [   39.886910] CPU: 9 PID: 62 Comm: kworker/u130:0 Not tainted
>> 4.10.0-rc3+ #528
>> [   39.886911] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
>> [   39.886917] Workqueue: events_unbound async_run_entry_fn
>> [   39.886918] Call Trace:
>> [   39.886923]  dump_stack+0x85/0xc9
>> [   39.886927]  __warn+0xd1/0xf0
>> [   39.886928]  warn_slowpath_null+0x1d/0x20
>> [   39.886930]  __blk_mq_finish_request+0x124/0x140
>> [   39.886932]  blk_mq_finish_request+0x55/0x60
>> [   39.886934]  blk_mq_sched_put_request+0x78/0x80
>> [   39.886936]  blk_mq_free_request+0xe/0x10
>> [   39.886938]  blk_put_request+0x25/0x60
>> [   39.886944]  __scsi_execute.isra.24+0x104/0x160
>> [   39.886946]  scsi_execute_req_flags+0x94/0x100
>> [   39.886948]  scsi_report_opcode+0xab/0x100
>>
>> checking ...
>>
> Ah.
> Seems like the elevator switch races with device setup:
> 
>  1188.490326] [ cut here ]
> [ 1188.490334] WARNING: CPU: 9 PID: 30155 at block/blk-mq.c:342
> __blk_mq_finish_request+0x172/0x180
> [ 1188.490335] Modules linked in: mpt3sas(+) raid_class rpcsec_gss_krb5
> auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filt
> er ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables
> af_packet br_netfilter bridge stp llc iscsi_ibft iscs
> i_boot_sysfs sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_p
> clmul tg3 ixgbe ghash_clmulni_intel pcbc ptp aesni_intel pps_core
> aes_x86_64 ipmi_ssif hpilo hpwdt mdio libphy pcc_cpufreq cry
> pto_simd glue_helper iTCO_wdt iTCO_vendor_support acpi_cpufreq tpm_tis
> ipmi_si ipmi_devintf cryptd lpc_ich pcspkr ioatdma tpm_
> tis_core thermal wmi shpchp dca ipmi_msghandler tpm fjes button sunrpc
> btrfs xor sr_mod raid6_pq cdrom ehci_pci mgag200 i2c_al
> go_bit drm_kms_helper syscopyarea sysfillrect uhci_hcd
> [ 1188.490399]  sysimgblt fb_sys_fops sd_mod ahci ehci_hcd ttm libahci
> crc32c_intel serio_raw drm libata usbcore usb_common hp
> sa scsi_transport_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua autofs4
> [ 1188.490411] CPU: 9 PID: 30155 Comm: kworker/u130:6 Not tainted
> 4.10.0-rc3+ #535
> [ 1188.490411] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
> [ 1188.490425] Workqueue: events_unbound async_run_entry_fn
> [ 1188.490427] Call Trace:
> [ 1188.490433]  dump_stack+0x85/0xc9
> [ 1188.490436]  __warn+0xd1/0xf0
> [ 1188.490438]  warn_slowpath_null+0x1d/0x20
> [ 1188.490440]  __blk_mq_finish_request+0x172/0x180
> [ 1188.490442]  blk_mq_finish_request+0x55/0x60
> [ 1188.490443]  blk_mq_sched_put_request+0x78/0x80
> [ 1188.490445]  blk_mq_free_request+0xe/0x10
> [ 1188.490448]  blk_put_request+0x25/0x60
> [ 1188.490453]  __scsi_execute.isra.24+0x104/0x160
> [ 1188.490455]  scsi_execute_req_flags+0x94/0x100
> [ 1188.490457]  scsi_report_opcode+0xab/0x100
> [ 1188.490461]  sd_revalidate_disk+0xaef/0x1450 [sd_mod]
> [ 1188.490464]  sd_probe_async+0xd1/0x1d0 [sd_mod]
> [ 1188.490466]  async_run_entry_fn+0x37/0x150
> [ 1188.490470]  process_one_work+0x1d0/0x660
> [ 1188.490472]  ? process_one_work+0x151/0x660
> [ 1188.490474]  worker_thread+0x12b/0x4a0
> [ 1188.490475]  kthread+0x10c/0x140
> [ 1188.490477]  ? process_one_work+0x660/0x660
> [ 1188.490478]  ? kthread_create_on_node+0x40/0x40
> [ 1188.490483]  ret_from_fork+0x2a/0x40
> [ 1188.490484] ---[ end trace d5e3a32ac269fc2a ]---
> [ 1188.490485] rq (487/52) rqs (-1/-1)
> [ 1188.523518] sd 7:0:0:0: [sdb] Attached SCSI disk
> [ 1188.540954]

Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Jens Axboe
On Fri, Jan 13 2017, Hannes Reinecke wrote:
> On 01/13/2017 12:04 PM, Hannes Reinecke wrote:
> > On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
> >> On 01/11/2017 10:39 PM, Jens Axboe wrote:
> >>> Another year, another posting of this patchset. The previous posting
> >>> was here:
> >>>
> >>> https://www.spinics.net/lists/kernel/msg2406106.html
> >>>
> >>> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
> >>>
> >>> I've reworked bits of this to get rid of the shadow requests, thanks
> >>> to Bart for the inspiration. The missing piece, for me, was the fact
> >>> that we have the tags->rqs[] indirection array already. I've done this
> >>> somewhat differently, though, by having the internal scheduler tag
> >>> map be allocated/torn down when an IO scheduler is attached or
> >>> detached. This also means that when we run without a scheduler, we
> >>> don't have to do double tag allocations, it'll work like before.
> >>>
> >>> The patchset applies on top of 4.10-rc3, or can be pulled here:
> >>>
> >>> git://git.kernel.dk/linux-block blk-mq-sched.6
> >>>
> >> Well ... something's wrong here on my machine:
> >>
> [ .. ]
> 
> Turns out that selecting CONFIG_DEFAULT_MQ_DEADLINE is the culprit;
> switching to CONFIG_DEFAULT_MQ_NONE and selecting mq-deadline after
> booting manually makes the problem go away.
> 
> So there is a race condition during device init and switching the I/O
> scheduler.
> 
> But the results from using mq-deadline are promising; the performance
> drop I've seen on older hardware seems to be resolved:
> 
> mq iosched:
>  seq read : io=13383MB, bw=228349KB/s, iops=57087
>  rand read : io=12876MB, bw=219709KB/s, iops=54927
>  seq write: io=14532MB, bw=247987KB/s, iops=61996
>  rand write: io=13779MB, bw=235127KB/s, iops=58781
> mq default:
>  seq read : io=13056MB, bw=222588KB/s, iops=55647
>  rand read : io=12908MB, bw=220069KB/s, iops=55017
>  seq write: io=13986MB, bw=238444KB/s, iops=59611
>  rand write: io=13733MB, bw=234128KB/s, iops=58532
> sq default:
>  seq read : io=10240MB, bw=194787KB/s, iops=48696
>  rand read : io=10240MB, bw=191374KB/s, iops=47843
>  seq write: io=10240MB, bw=245333KB/s, iops=61333
>  rand write: io=10240MB, bw=228239KB/s, iops=57059
> 
> measured on mpt2sas with SSD devices.

Perfect! Straight on the path of kill of non scsi-mq, then.

I'll fix up the async scan issue. The new mq schedulers don't really
behave differently in this regard, so I'm a bit puzzled. Hopefully it
reproduces here.

-- 
Jens Axboe



Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Jens Axboe
On Fri, Jan 13 2017, Hannes Reinecke wrote:
> On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
> > On 01/11/2017 10:39 PM, Jens Axboe wrote:
> >> Another year, another posting of this patchset. The previous posting
> >> was here:
> >>
> >> https://www.spinics.net/lists/kernel/msg2406106.html
> >>
> >> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
> >>
> >> I've reworked bits of this to get rid of the shadow requests, thanks
> >> to Bart for the inspiration. The missing piece, for me, was the fact
> >> that we have the tags->rqs[] indirection array already. I've done this
> >> somewhat differently, though, by having the internal scheduler tag
> >> map be allocated/torn down when an IO scheduler is attached or
> >> detached. This also means that when we run without a scheduler, we
> >> don't have to do double tag allocations, it'll work like before.
> >>
> >> The patchset applies on top of 4.10-rc3, or can be pulled here:
> >>
> >> git://git.kernel.dk/linux-block blk-mq-sched.6
> >>
> > Well ... something's wrong here on my machine:
> > 
> > [   39.886886] [ cut here ]
> > [   39.886895] WARNING: CPU: 9 PID: 62 at block/blk-mq.c:342
> > __blk_mq_finish_request+0x124/0x140
> > [   39.886895] Modules linked in: sd_mod ahci uhci_hcd ehci_pci
> > mpt3sas(+) libahci ehci_hcd serio_raw crc32c_intel raid_class drm libata
> > usbcore hpsa usb_common scsi_transport_sas sg dm_multipath dm_mod
> > scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
> > [   39.886910] CPU: 9 PID: 62 Comm: kworker/u130:0 Not tainted
> > 4.10.0-rc3+ #528
> > [   39.886911] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
> > [   39.886917] Workqueue: events_unbound async_run_entry_fn
> > [   39.886918] Call Trace:
> > [   39.886923]  dump_stack+0x85/0xc9
> > [   39.886927]  __warn+0xd1/0xf0
> > [   39.886928]  warn_slowpath_null+0x1d/0x20
> > [   39.886930]  __blk_mq_finish_request+0x124/0x140
> > [   39.886932]  blk_mq_finish_request+0x55/0x60
> > [   39.886934]  blk_mq_sched_put_request+0x78/0x80
> > [   39.886936]  blk_mq_free_request+0xe/0x10
> > [   39.886938]  blk_put_request+0x25/0x60
> > [   39.886944]  __scsi_execute.isra.24+0x104/0x160
> > [   39.886946]  scsi_execute_req_flags+0x94/0x100
> > [   39.886948]  scsi_report_opcode+0xab/0x100
> > 
> > checking ...
> > 
> Ah.
> Seems like the elevator switch races with device setup:

Huh, funky, haven't seen that. I'll see if I can reproduce it here. I
don't have SCAN_ASYNC turned on, on my test box.

-- 
Jens Axboe



Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/13/2017 12:04 PM, Hannes Reinecke wrote:
> On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
>> On 01/11/2017 10:39 PM, Jens Axboe wrote:
>>> Another year, another posting of this patchset. The previous posting
>>> was here:
>>>
>>> https://www.spinics.net/lists/kernel/msg2406106.html
>>>
>>> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
>>>
>>> I've reworked bits of this to get rid of the shadow requests, thanks
>>> to Bart for the inspiration. The missing piece, for me, was the fact
>>> that we have the tags->rqs[] indirection array already. I've done this
>>> somewhat differently, though, by having the internal scheduler tag
>>> map be allocated/torn down when an IO scheduler is attached or
>>> detached. This also means that when we run without a scheduler, we
>>> don't have to do double tag allocations, it'll work like before.
>>>
>>> The patchset applies on top of 4.10-rc3, or can be pulled here:
>>>
>>> git://git.kernel.dk/linux-block blk-mq-sched.6
>>>
>> Well ... something's wrong here on my machine:
>>
[ .. ]

Turns out that selecting CONFIG_DEFAULT_MQ_DEADLINE is the culprit;
switching to CONFIG_DEFAULT_MQ_NONE and selecting mq-deadline after
booting manually makes the problem go away.

So there is a race condition during device init and switching the I/O
scheduler.

But the results from using mq-deadline are promising; the performance
drop I've seen on older hardware seems to be resolved:

mq iosched:
 seq read : io=13383MB, bw=228349KB/s, iops=57087
 rand read : io=12876MB, bw=219709KB/s, iops=54927
 seq write: io=14532MB, bw=247987KB/s, iops=61996
 rand write: io=13779MB, bw=235127KB/s, iops=58781
mq default:
 seq read : io=13056MB, bw=222588KB/s, iops=55647
 rand read : io=12908MB, bw=220069KB/s, iops=55017
 seq write: io=13986MB, bw=238444KB/s, iops=59611
 rand write: io=13733MB, bw=234128KB/s, iops=58532
sq default:
 seq read : io=10240MB, bw=194787KB/s, iops=48696
 rand read : io=10240MB, bw=191374KB/s, iops=47843
 seq write: io=10240MB, bw=245333KB/s, iops=61333
 rand write: io=10240MB, bw=228239KB/s, iops=57059

measured on mpt2sas with SSD devices.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/13/2017 09:15 AM, Hannes Reinecke wrote:
> On 01/11/2017 10:39 PM, Jens Axboe wrote:
>> Another year, another posting of this patchset. The previous posting
>> was here:
>>
>> https://www.spinics.net/lists/kernel/msg2406106.html
>>
>> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
>>
>> I've reworked bits of this to get rid of the shadow requests, thanks
>> to Bart for the inspiration. The missing piece, for me, was the fact
>> that we have the tags->rqs[] indirection array already. I've done this
>> somewhat differently, though, by having the internal scheduler tag
>> map be allocated/torn down when an IO scheduler is attached or
>> detached. This also means that when we run without a scheduler, we
>> don't have to do double tag allocations, it'll work like before.
>>
>> The patchset applies on top of 4.10-rc3, or can be pulled here:
>>
>> git://git.kernel.dk/linux-block blk-mq-sched.6
>>
> Well ... something's wrong here on my machine:
> 
> [   39.886886] [ cut here ]
> [   39.886895] WARNING: CPU: 9 PID: 62 at block/blk-mq.c:342
> __blk_mq_finish_request+0x124/0x140
> [   39.886895] Modules linked in: sd_mod ahci uhci_hcd ehci_pci
> mpt3sas(+) libahci ehci_hcd serio_raw crc32c_intel raid_class drm libata
> usbcore hpsa usb_common scsi_transport_sas sg dm_multipath dm_mod
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
> [   39.886910] CPU: 9 PID: 62 Comm: kworker/u130:0 Not tainted
> 4.10.0-rc3+ #528
> [   39.886911] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
> [   39.886917] Workqueue: events_unbound async_run_entry_fn
> [   39.886918] Call Trace:
> [   39.886923]  dump_stack+0x85/0xc9
> [   39.886927]  __warn+0xd1/0xf0
> [   39.886928]  warn_slowpath_null+0x1d/0x20
> [   39.886930]  __blk_mq_finish_request+0x124/0x140
> [   39.886932]  blk_mq_finish_request+0x55/0x60
> [   39.886934]  blk_mq_sched_put_request+0x78/0x80
> [   39.886936]  blk_mq_free_request+0xe/0x10
> [   39.886938]  blk_put_request+0x25/0x60
> [   39.886944]  __scsi_execute.isra.24+0x104/0x160
> [   39.886946]  scsi_execute_req_flags+0x94/0x100
> [   39.886948]  scsi_report_opcode+0xab/0x100
> 
> checking ...
> 
Ah.
Seems like the elevator switch races with device setup:

 1188.490326] [ cut here ]
[ 1188.490334] WARNING: CPU: 9 PID: 30155 at block/blk-mq.c:342
__blk_mq_finish_request+0x172/0x180
[ 1188.490335] Modules linked in: mpt3sas(+) raid_class rpcsec_gss_krb5
auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filt
er ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables
af_packet br_netfilter bridge stp llc iscsi_ibft iscs
i_boot_sysfs sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_p
clmul tg3 ixgbe ghash_clmulni_intel pcbc ptp aesni_intel pps_core
aes_x86_64 ipmi_ssif hpilo hpwdt mdio libphy pcc_cpufreq cry
pto_simd glue_helper iTCO_wdt iTCO_vendor_support acpi_cpufreq tpm_tis
ipmi_si ipmi_devintf cryptd lpc_ich pcspkr ioatdma tpm_
tis_core thermal wmi shpchp dca ipmi_msghandler tpm fjes button sunrpc
btrfs xor sr_mod raid6_pq cdrom ehci_pci mgag200 i2c_al
go_bit drm_kms_helper syscopyarea sysfillrect uhci_hcd
[ 1188.490399]  sysimgblt fb_sys_fops sd_mod ahci ehci_hcd ttm libahci
crc32c_intel serio_raw drm libata usbcore usb_common hp
sa scsi_transport_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc
scsi_dh_alua autofs4
[ 1188.490411] CPU: 9 PID: 30155 Comm: kworker/u130:6 Not tainted
4.10.0-rc3+ #535
[ 1188.490411] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
[ 1188.490425] Workqueue: events_unbound async_run_entry_fn
[ 1188.490427] Call Trace:
[ 1188.490433]  dump_stack+0x85/0xc9
[ 1188.490436]  __warn+0xd1/0xf0
[ 1188.490438]  warn_slowpath_null+0x1d/0x20
[ 1188.490440]  __blk_mq_finish_request+0x172/0x180
[ 1188.490442]  blk_mq_finish_request+0x55/0x60
[ 1188.490443]  blk_mq_sched_put_request+0x78/0x80
[ 1188.490445]  blk_mq_free_request+0xe/0x10
[ 1188.490448]  blk_put_request+0x25/0x60
[ 1188.490453]  __scsi_execute.isra.24+0x104/0x160
[ 1188.490455]  scsi_execute_req_flags+0x94/0x100
[ 1188.490457]  scsi_report_opcode+0xab/0x100
[ 1188.490461]  sd_revalidate_disk+0xaef/0x1450 [sd_mod]
[ 1188.490464]  sd_probe_async+0xd1/0x1d0 [sd_mod]
[ 1188.490466]  async_run_entry_fn+0x37/0x150
[ 1188.490470]  process_one_work+0x1d0/0x660
[ 1188.490472]  ? process_one_work+0x151/0x660
[ 1188.490474]  worker_thread+0x12b/0x4a0
[ 1188.490475]  kthread+0x10c/0x140
[ 1188.490477]  ? process_one_work+0x660/0x660
[ 1188.490478]  ? kthread_create_on_node+0x40/0x40
[ 1188.490483]  ret_from_fork+0x2a/0x40
[ 1188.490484] ---[ end trace d5e3a32ac269fc2a ]---
[ 1188.490485] rq (487/52) rqs (-1/-1)
[ 1188.523518] sd 7:0:0:0: [sdb] Attached SCSI disk
[ 1188.540954] elevator: switch to deadline failed

(The 'rqs' line is a debug output from me:

struct request *rqs_rq =
hctx->tags->rqs[rq->tag];

  

Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/11/2017 10:39 PM, Jens Axboe wrote:
> Another year, another posting of this patchset. The previous posting
> was here:
> 
> https://www.spinics.net/lists/kernel/msg2406106.html
> 
> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
> 
> I've reworked bits of this to get rid of the shadow requests, thanks
> to Bart for the inspiration. The missing piece, for me, was the fact
> that we have the tags->rqs[] indirection array already. I've done this
> somewhat differently, though, by having the internal scheduler tag
> map be allocated/torn down when an IO scheduler is attached or
> detached. This also means that when we run without a scheduler, we
> don't have to do double tag allocations, it'll work like before.
> 
> The patchset applies on top of 4.10-rc3, or can be pulled here:
> 
> git://git.kernel.dk/linux-block blk-mq-sched.6
> 
Fun continues:

[   28.976708] ata3.00: configured for UDMA/100
[   28.987625] BUG: unable to handle kernel NULL pointer dereference at
0048
[   28.987632] IP: deadline_add_request+0x15/0x70
[   28.987633] PGD 0
[   28.987634]
[   28.987636] Oops:  [#1] SMP
[   28.987638] Modules linked in: ahci libahci libata uhci_hcd(+)
mgag200(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm drm tg3 libphy ehci_pci ehci_hcd usbcore usb_common
ixgbe mdio hpsa(+) dca ptp pps_core scsi_transport_sas fjes(+) sg
dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
[   28.987654] CPU: 0 PID: 268 Comm: kworker/u2:2 Not tainted
4.10.0-rc3+ #535
[   28.987655] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
[   28.987660] Workqueue: events_unbound async_run_entry_fn
[   28.987661] task: 880029391600 task.stack: c938c000
[   28.987663] RIP: 0010:deadline_add_request+0x15/0x70
[   28.987664] RSP: 0018:c938fb00 EFLAGS: 00010286
[   28.987665] RAX: 88003260c400 RBX:  RCX:

[   28.987666] RDX: c938fb68 RSI:  RDI:
8800293b9040
[   28.987666] RBP: c938fb18 R08: 0087668] R13:
88003260c400 R14:  R15: 
[   28.987670] FS:  () GS:880035c0()
knlGS:
[   28.987670] CS:  0010 DS:  ES:  CR0: 80050033
[   28.987671] CR2: 0048 CR3: 32b64000 CR4:
000406f0
[   28.987672] Call Trace:
[   28.987677]  blk_mq_sched_get_request+0x12e/0x310
[   28.987678]  ? blk_mq_sched_get_request+0x5/0x310
[   28.987681]  blk_mq_alloc_request+0x40/0x90
[   28.987684]  blk_get_request+0x35/0x110
[   28.987689]  __scsi_execute.isra.24+0x3c/0x160
[   28.987691]  scsi_execute_req_flags+0x94/0x100
[   28.987694]  scsi_probe_and_add_lun+0x207/0xd60
[   28.987699]  ? __pm_rme_resume+0x5c/0x80
[   28.987701]  __scsi_add_device+0x103/0x120
[   28.987709]  ata_scsi_scan_host+0xa3/0x1d0 [libata]
[   28.987716]  async_port_probe+0x43/0x60 [libata]
[   28.987718]  async_run_entry_fn+0x37/0x150
[   28.987722]  process_one_work+0x1d0/0x660

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-13 Thread Hannes Reinecke
On 01/11/2017 10:39 PM, Jens Axboe wrote:
> Another year, another posting of this patchset. The previous posting
> was here:
> 
> https://www.spinics.net/lists/kernel/msg2406106.html
> 
> (yes, I've skipped v5, it was fixes on top of v4, not the rework).
> 
> I've reworked bits of this to get rid of the shadow requests, thanks
> to Bart for the inspiration. The missing piece, for me, was the fact
> that we have the tags->rqs[] indirection array already. I've done this
> somewhat differently, though, by having the internal scheduler tag
> map be allocated/torn down when an IO scheduler is attached or
> detached. This also means that when we run without a scheduler, we
> don't have to do double tag allocations, it'll work like before.
> 
> The patchset applies on top of 4.10-rc3, or can be pulled here:
> 
> git://git.kernel.dk/linux-block blk-mq-sched.6
> 
Well ... something's wrong here on my machine:

[   39.886886] [ cut here ]
[   39.886895] WARNING: CPU: 9 PID: 62 at block/blk-mq.c:342
__blk_mq_finish_request+0x124/0x140
[   39.886895] Modules linked in: sd_mod ahci uhci_hcd ehci_pci
mpt3sas(+) libahci ehci_hcd serio_raw crc32c_intel raid_class drm libata
usbcore hpsa usb_common scsi_transport_sas sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
[   39.886910] CPU: 9 PID: 62 Comm: kworker/u130:0 Not tainted
4.10.0-rc3+ #528
[   39.886911] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 09/08/2013
[   39.886917] Workqueue: events_unbound async_run_entry_fn
[   39.886918] Call Trace:
[   39.886923]  dump_stack+0x85/0xc9
[   39.886927]  __warn+0xd1/0xf0
[   39.886928]  warn_slowpath_null+0x1d/0x20
[   39.886930]  __blk_mq_finish_request+0x124/0x140
[   39.886932]  blk_mq_finish_request+0x55/0x60
[   39.886934]  blk_mq_sched_put_request+0x78/0x80
[   39.886936]  blk_mq_free_request+0xe/0x10
[   39.886938]  blk_put_request+0x25/0x60
[   39.886944]  __scsi_execute.isra.24+0x104/0x160
[   39.886946]  scsi_execute_req_flags+0x94/0x100
[   39.886948]  scsi_report_opcode+0xab/0x100

checking ...

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCHSET v6] blk-mq scheduling framework

2017-01-12 Thread Bart Van Assche
On Wed, 2017-01-11 at 14:39 -0700, Jens Axboe wrote:
> I've reworked bits of this to get rid of the shadow requests, thanks
> to Bart for the inspiration. The missing piece, for me, was the fact
> that we have the tags->rqs[] indirection array already. I've done this
> somewhat differently, though, by having the internal scheduler tag
> map be allocated/torn down when an IO scheduler is attached or
> detached. This also means that when we run without a scheduler, we
> don't have to do double tag allocations, it'll work like before.

Hello Jens,

Thanks for having done the rework! This series looks great to me. I have a
few small comments though. I will post these as replies to the individual
patches.

Bart.