Re: v4.16-rc1 + dm-mpath + BFQ
> Il giorno 10 mag 2018, alle ore 18:12, Bart Van Assche > ha scritto: > > On Fri, 2018-05-04 at 22:11 +0200, Paolo Valente wrote: >>> Il giorno 30 mar 2018, alle ore 18:57, Bart Van Assche >>> ha scritto: >>> >>> On Fri, 2018-03-30 at 10:23 +0200, Paolo Valente wrote: Still 4.16-rc1, being that the version for which you reported this issue in the first place. >>> >>> A vanilla v4.16-rc1 kernel is not sufficient to run the srp-test software >>> since RDMA/CM support for the SRP target driver is missing from that kernel. >>> That's why I asked you to use the for-next branch from my github repository >>> in a previous e-mail. Anyway, since the necessary patches are now in >>> linux-next, the srp-test software can also be run against linux-next. Here >>> are the results that I obtained with label next-20180329 and the kernel >>> config attached to your previous e-mail: >>> >>> # while ./srp-test/run_tests -c -d -r 10 -e bfq; do :; done >>> >>> BUG: unable to handle kernel NULL pointer dereference at 0200 >>> PGD 0 P4D 0 >>> Oops: 0002 [#1] SMP PTI >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>> 1.0.0-prebuilt.qemu-project.org 04/01/2014 >>> RIP: 0010:rb_erase+0x284/0x380 >>> Call Trace: >>> >>> elv_rb_del+0x24/0x30 >>> bfq_remove_request+0x9a/0x2e0 [bfq] >>> ? rcu_read_lock_sched_held+0x64/0x70 >>> ? update_load_avg+0x72b/0x760 >>> bfq_finish_requeue_request+0x2e1/0x3b0 [bfq] >>> ? __lock_is_held+0x5a/0xa0 >>> blk_mq_free_request+0x5f/0x1a0 >>> blk_put_request+0x23/0x60 >>> multipath_release_clone+0xe/0x10 >>> dm_softirq_done+0xe3/0x270 >>> __blk_mq_complete_request_remote+0x18/0x20 >>> flush_smp_call_function_queue+0xa1/0x150 >>> generic_smp_call_function_single_interrupt+0x13/0x30 >>> smp_call_function_single_interrupt+0x4d/0x220 >>> call_function_single_interrupt+0xf/0x20 >>> >> >> I suspect my recent fix [1] might fix your failure too. >> >> [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1682264.html > > Hello Paolo, > > With patch [1] applied I can't reproduce the aforementioned crash. I will add > my Tested-by. > Great, thanks! Paolo > Thanks, > > Bart. > >
Re: v4.16-rc1 + dm-mpath + BFQ
On Fri, 2018-05-04 at 22:11 +0200, Paolo Valente wrote: > > Il giorno 30 mar 2018, alle ore 18:57, Bart Van Assche > > ha scritto: > > > > On Fri, 2018-03-30 at 10:23 +0200, Paolo Valente wrote: > > > Still 4.16-rc1, being that the version for which you reported this > > > issue in the first place. > > > > A vanilla v4.16-rc1 kernel is not sufficient to run the srp-test software > > since RDMA/CM support for the SRP target driver is missing from that kernel. > > That's why I asked you to use the for-next branch from my github repository > > in a previous e-mail. Anyway, since the necessary patches are now in > > linux-next, the srp-test software can also be run against linux-next. Here > > are the results that I obtained with label next-20180329 and the kernel > > config attached to your previous e-mail: > > > > # while ./srp-test/run_tests -c -d -r 10 -e bfq; do :; done > > > > BUG: unable to handle kernel NULL pointer dereference at 0200 > > PGD 0 P4D 0 > > Oops: 0002 [#1] SMP PTI > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > 1.0.0-prebuilt.qemu-project.org 04/01/2014 > > RIP: 0010:rb_erase+0x284/0x380 > > Call Trace: > > > > elv_rb_del+0x24/0x30 > > bfq_remove_request+0x9a/0x2e0 [bfq] > > ? rcu_read_lock_sched_held+0x64/0x70 > > ? update_load_avg+0x72b/0x760 > > bfq_finish_requeue_request+0x2e1/0x3b0 [bfq] > > ? __lock_is_held+0x5a/0xa0 > > blk_mq_free_request+0x5f/0x1a0 > > blk_put_request+0x23/0x60 > > multipath_release_clone+0xe/0x10 > > dm_softirq_done+0xe3/0x270 > > __blk_mq_complete_request_remote+0x18/0x20 > > flush_smp_call_function_queue+0xa1/0x150 > > generic_smp_call_function_single_interrupt+0x13/0x30 > > smp_call_function_single_interrupt+0x4d/0x220 > > call_function_single_interrupt+0xf/0x20 > > > > I suspect my recent fix [1] might fix your failure too. > > [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1682264.html Hello Paolo, With patch [1] applied I can't reproduce the aforementioned crash. I will add my Tested-by. Thanks, Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
On Thu, 2018-05-10 at 15:16 +, Bart Van Assche wrote: > On Fri, 2018-05-04 at 16:42 -0400, Laurence Oberman wrote: > > I was never able to reproduce Barts original issue using his tree > > and > > actual mlx5/cx4 hardware and ibsrp > > I enabled BFQ with no other special tuning for the moath and > > subpaths. > > I was waiting for him to come back from vacation to check with him. > > (back in the office) > > Hello Laurence, > > What I understood from off-list communication is that you tried to > find > a way to reproduce what I reported without using the srp-test > software. > My understanding is that both Paolo and I can reproduce the reported > issue > with the srp-test software. > > Bart. > > > Hello Bart using your kernel 4.17.0-rc2.bart+ CONFIG_IOSCHED_BFQ=y CONFIG_BFQ_GROUP_IOSCHED=y These are all SRP LUNS 36001405b2b5c6c24c084b6fa4d55da2f dm-27 LIO-ORG ,block-10 size=3.9G features='2 queue_mode mq' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active |- 2:0:0:9 sdap 66:144 active ready running `- 1:0:0:9 sdaz 67:48 active ready running 36001405b26ebe76dcb94a489f6f245f8 dm-18 LIO-ORG ,block-21 size=3.9G features='2 queue_mode mq' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active |- 2:0:0:20 sdx 65:112 active ready running `- 1:0:0:20 sdaa 65:160 active ready running [root@ibclient ~]# cd /sys/block [root@ibclient block]# cat /sys/block/dm-18/queue/scheduler mq-deadline kyber [bfq] none [root@ibclient block]# cat /sys/block/sdaa/queue/scheduler mq-deadline kyber [bfq] none [root@ibclient block]# cat /sys/block/sdx/queue/scheduler mq-deadline kyber [bfq] none Not using the test software just exercising the LUNS via my own tests I am unable to get the OOPS I guess something in the srp-test software triggers it then. Doing plenty of IO to 5 mpath devices (1.3Gbytes/sec) #Time cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 12:08:320 0 1437 1107 88G 5M 1G 902M 300M 178M 1380K345 0 0 6 74 0 4 Thanks Laurence
Re: v4.16-rc1 + dm-mpath + BFQ
> Il giorno 10 mag 2018, alle ore 17:16, Bart Van Assche > ha scritto: > > On Fri, 2018-05-04 at 16:42 -0400, Laurence Oberman wrote: >> I was never able to reproduce Barts original issue using his tree and >> actual mlx5/cx4 hardware and ibsrp >> I enabled BFQ with no other special tuning for the moath and subpaths. >> I was waiting for him to come back from vacation to check with him. > > (back in the office) > > Hello Laurence, > > What I understood from off-list communication is that you tried to find > a way to reproduce what I reported without using the srp-test software. > My understanding is that both Paolo and I can reproduce the reported issue > with the srp-test software. > Thanks for chiming in, Bart. Above all, with my fix [1] it should be gone. Looking forward to your feedback, Paolo [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1682264.html > Bart. > > >
Re: v4.16-rc1 + dm-mpath + BFQ
On Fri, 2018-05-04 at 16:42 -0400, Laurence Oberman wrote: > I was never able to reproduce Barts original issue using his tree and > actual mlx5/cx4 hardware and ibsrp > I enabled BFQ with no other special tuning for the moath and subpaths. > I was waiting for him to come back from vacation to check with him. (back in the office) Hello Laurence, What I understood from off-list communication is that you tried to find a way to reproduce what I reported without using the srp-test software. My understanding is that both Paolo and I can reproduce the reported issue with the srp-test software. Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
On Fri, 2018-05-04 at 22:11 +0200, Paolo Valente wrote: > > Il giorno 30 mar 2018, alle ore 18:57, Bart Van Assche > c...@wdc.com> ha scritto: > > > > On Fri, 2018-03-30 at 10:23 +0200, Paolo Valente wrote: > > > Still 4.16-rc1, being that the version for which you reported > > > this > > > issue in the first place. > > > > A vanilla v4.16-rc1 kernel is not sufficient to run the srp-test > > software > > since RDMA/CM support for the SRP target driver is missing from > > that kernel. > > That's why I asked you to use the for-next branch from my github > > repository > > in a previous e-mail. Anyway, since the necessary patches are now > > in > > linux-next, the srp-test software can also be run against linux- > > next. Here > > are the results that I obtained with label next-20180329 and the > > kernel > > config attached to your previous e-mail: > > > > # while ./srp-test/run_tests -c -d -r 10 -e bfq; do :; done > > > > BUG: unable to handle kernel NULL pointer dereference at > > 0200 > > PGD 0 P4D 0 > > Oops: 0002 [#1] SMP PTI > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0- > > prebuilt.qemu-project.org 04/01/2014 > > RIP: 0010:rb_erase+0x284/0x380 > > Call Trace: > > > > elv_rb_del+0x24/0x30 > > bfq_remove_request+0x9a/0x2e0 [bfq] > > ? rcu_read_lock_sched_held+0x64/0x70 > > ? update_load_avg+0x72b/0x760 > > bfq_finish_requeue_request+0x2e1/0x3b0 [bfq] > > ? __lock_is_held+0x5a/0xa0 > > blk_mq_free_request+0x5f/0x1a0 > > blk_put_request+0x23/0x60 > > multipath_release_clone+0xe/0x10 > > dm_softirq_done+0xe3/0x270 > > __blk_mq_complete_request_remote+0x18/0x20 > > flush_smp_call_function_queue+0xa1/0x150 > > generic_smp_call_function_single_interrupt+0x13/0x30 > > smp_call_function_single_interrupt+0x4d/0x220 > > call_function_single_interrupt+0xf/0x20 > > > > > > Hi Bart, > I suspect my recent fix [1] might fix your failure too. > > Thanks, > Paolo > > [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1682 > 264.html > > > Bart. > > > > > > > > I was never able to reproduce Barts original issue using his tree and actual mlx5/cx4 hardware and ibsrp I enabled BFQ with no other special tuning for the moath and subpaths. I was waiting for him to come back from vacation to check with him. Thanks Laurence
Re: v4.16-rc1 + dm-mpath + BFQ
> Il giorno 30 mar 2018, alle ore 18:57, Bart Van Assche > ha scritto: > > On Fri, 2018-03-30 at 10:23 +0200, Paolo Valente wrote: >> Still 4.16-rc1, being that the version for which you reported this >> issue in the first place. > > A vanilla v4.16-rc1 kernel is not sufficient to run the srp-test software > since RDMA/CM support for the SRP target driver is missing from that kernel. > That's why I asked you to use the for-next branch from my github repository > in a previous e-mail. Anyway, since the necessary patches are now in > linux-next, the srp-test software can also be run against linux-next. Here > are the results that I obtained with label next-20180329 and the kernel > config attached to your previous e-mail: > > # while ./srp-test/run_tests -c -d -r 10 -e bfq; do :; done > > BUG: unable to handle kernel NULL pointer dereference at 0200 > PGD 0 P4D 0 > Oops: 0002 [#1] SMP PTI > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.0.0-prebuilt.qemu-project.org 04/01/2014 > RIP: 0010:rb_erase+0x284/0x380 > Call Trace: > > elv_rb_del+0x24/0x30 > bfq_remove_request+0x9a/0x2e0 [bfq] > ? rcu_read_lock_sched_held+0x64/0x70 > ? update_load_avg+0x72b/0x760 > bfq_finish_requeue_request+0x2e1/0x3b0 [bfq] > ? __lock_is_held+0x5a/0xa0 > blk_mq_free_request+0x5f/0x1a0 > blk_put_request+0x23/0x60 > multipath_release_clone+0xe/0x10 > dm_softirq_done+0xe3/0x270 > __blk_mq_complete_request_remote+0x18/0x20 > flush_smp_call_function_queue+0xa1/0x150 > generic_smp_call_function_single_interrupt+0x13/0x30 > smp_call_function_single_interrupt+0x4d/0x220 > call_function_single_interrupt+0xf/0x20 > > Hi Bart, I suspect my recent fix [1] might fix your failure too. Thanks, Paolo [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1682264.html > Bart. > > >
Re: v4.16-rc1 + dm-mpath + BFQ
> Il giorno 01 apr 2018, alle ore 10:56, Paolo Valente > ha scritto: > > > >> Il giorno 30 mar 2018, alle ore 18:57, Bart Van Assche >> ha scritto: >> >> On Fri, 2018-03-30 at 10:23 +0200, Paolo Valente wrote: >>> Still 4.16-rc1, being that the version for which you reported this >>> issue in the first place. >> >> A vanilla v4.16-rc1 kernel is not sufficient to run the srp-test software >> since RDMA/CM support for the SRP target driver is missing from that kernel. >> That's why I asked you to use the for-next branch from my github repository >> in a previous e-mail. > > Yep, that's the branch/top commit I used (as you suggested): > 190943ce1824 [bvanassche/for-next] scsi: mpt3sas: fix oops in error handlers > after shutdown/unload > with > bvanasschehttps://github.com/bvanassche/linux.git > > The kernel in that branch presents itself as 4.16-rc1, but, as you > point out, it should contain the needed support. > >> Anyway, since the necessary patches are now in >> linux-next, the srp-test software can also be run against linux-next. Here >> are the results that I obtained with label next-20180329 and the kernel >> config attached to your previous e-mail: >> >> # while ./srp-test/run_tests -c -d -r 10 -e bfq; do :; done >> >> BUG: unable to handle kernel NULL pointer dereference at 0200 >> PGD 0 P4D 0 >> Oops: 0002 [#1] SMP PTI >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.0.0-prebuilt.qemu-project.org 04/01/2014 >> RIP: 0010:rb_erase+0x284/0x380 >> Call Trace: >> >> elv_rb_del+0x24/0x30 >> bfq_remove_request+0x9a/0x2e0 [bfq] >> ? rcu_read_lock_sched_held+0x64/0x70 >> ? update_load_avg+0x72b/0x760 >> bfq_finish_requeue_request+0x2e1/0x3b0 [bfq] >> ? __lock_is_held+0x5a/0xa0 >> blk_mq_free_request+0x5f/0x1a0 >> blk_put_request+0x23/0x60 >> multipath_release_clone+0xe/0x10 >> dm_softirq_done+0xe3/0x270 >> __blk_mq_complete_request_remote+0x18/0x20 >> flush_smp_call_function_queue+0xa1/0x150 >> generic_smp_call_function_single_interrupt+0x13/0x30 >> smp_call_function_single_interrupt+0x4d/0x220 >> call_function_single_interrupt+0xf/0x20 >> >> > > This new trace just confirms my suspects. Looking forward to some > feedback from Mike or Jens. Otherwise I'll try to look into it > myself, although I don't think I am the right person to suggest the > best cure for this cloning issue. > Hi Bart, I tried to investigate this further, but the corruption of a cloned request (or some other mishappening) that then causes this failure occurs somewhere, earlier, in the cloning phase; and, as I feared, I was not able to spot the mistake in that part of the code, especially because I'm not able to reproduce the failure itself. I might possibly have more luck after some hints from knowledgeable people. Otherwise, if, in your test, this failure occurs immediately after you start the test, and if you are willing to repeat this test with my development version of bfq, then we may have hope to get a detailed trace of what happens under the hood. Thanks, Paolo > Thanks, > Paolo > >> Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
> Il giorno 30 mar 2018, alle ore 18:57, Bart Van Assche > ha scritto: > > On Fri, 2018-03-30 at 10:23 +0200, Paolo Valente wrote: >> Still 4.16-rc1, being that the version for which you reported this >> issue in the first place. > > A vanilla v4.16-rc1 kernel is not sufficient to run the srp-test software > since RDMA/CM support for the SRP target driver is missing from that kernel. > That's why I asked you to use the for-next branch from my github repository > in a previous e-mail. Yep, that's the branch/top commit I used (as you suggested): 190943ce1824 [bvanassche/for-next] scsi: mpt3sas: fix oops in error handlers after shutdown/unload with bvanassche https://github.com/bvanassche/linux.git The kernel in that branch presents itself as 4.16-rc1, but, as you point out, it should contain the needed support. > Anyway, since the necessary patches are now in > linux-next, the srp-test software can also be run against linux-next. Here > are the results that I obtained with label next-20180329 and the kernel > config attached to your previous e-mail: > > # while ./srp-test/run_tests -c -d -r 10 -e bfq; do :; done > > BUG: unable to handle kernel NULL pointer dereference at 0200 > PGD 0 P4D 0 > Oops: 0002 [#1] SMP PTI > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.0.0-prebuilt.qemu-project.org 04/01/2014 > RIP: 0010:rb_erase+0x284/0x380 > Call Trace: > > elv_rb_del+0x24/0x30 > bfq_remove_request+0x9a/0x2e0 [bfq] > ? rcu_read_lock_sched_held+0x64/0x70 > ? update_load_avg+0x72b/0x760 > bfq_finish_requeue_request+0x2e1/0x3b0 [bfq] > ? __lock_is_held+0x5a/0xa0 > blk_mq_free_request+0x5f/0x1a0 > blk_put_request+0x23/0x60 > multipath_release_clone+0xe/0x10 > dm_softirq_done+0xe3/0x270 > __blk_mq_complete_request_remote+0x18/0x20 > flush_smp_call_function_queue+0xa1/0x150 > generic_smp_call_function_single_interrupt+0x13/0x30 > smp_call_function_single_interrupt+0x4d/0x220 > call_function_single_interrupt+0xf/0x20 > > This new trace just confirms my suspects. Looking forward to some feedback from Mike or Jens. Otherwise I'll try to look into it myself, although I don't think I am the right person to suggest the best cure for this cloning issue. Thanks, Paolo > Bart. > > >
Re: v4.16-rc1 + dm-mpath + BFQ
On Fri, 2018-03-30 at 10:23 +0200, Paolo Valente wrote: > Still 4.16-rc1, being that the version for which you reported this > issue in the first place. A vanilla v4.16-rc1 kernel is not sufficient to run the srp-test software since RDMA/CM support for the SRP target driver is missing from that kernel. That's why I asked you to use the for-next branch from my github repository in a previous e-mail. Anyway, since the necessary patches are now in linux-next, the srp-test software can also be run against linux-next. Here are the results that I obtained with label next-20180329 and the kernel config attached to your previous e-mail: # while ./srp-test/run_tests -c -d -r 10 -e bfq; do :; done BUG: unable to handle kernel NULL pointer dereference at 0200 PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:rb_erase+0x284/0x380 Call Trace: elv_rb_del+0x24/0x30 bfq_remove_request+0x9a/0x2e0 [bfq] ? rcu_read_lock_sched_held+0x64/0x70 ? update_load_avg+0x72b/0x760 bfq_finish_requeue_request+0x2e1/0x3b0 [bfq] ? __lock_is_held+0x5a/0xa0 blk_mq_free_request+0x5f/0x1a0 blk_put_request+0x23/0x60 multipath_release_clone+0xe/0x10 dm_softirq_done+0xe3/0x270 __blk_mq_complete_request_remote+0x18/0x20 flush_smp_call_function_queue+0xa1/0x150 generic_smp_call_function_single_interrupt+0x13/0x30 smp_call_function_single_interrupt+0x4d/0x220 call_function_single_interrupt+0xf/0x20 Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
+Jens, Mike > Il giorno 30 mar 2018, alle ore 01:16, Bart Van Assche > ha scritto: > > On Thu, 2018-03-29 at 11:02 +0200, Paolo Valente wrote: >>> Il giorno 01 mar 2018, alle ore 02:35, Bart Van Assche >>> ha scritto: >>> Thank you for having shared your kernel config off-list. After having >>> made the following changes to your kernel config I was able to run the >>> srp-test software: >>> * Enable CONFIG_DM_MULTIPATH_QL, CONFIG_DM_MULTIPATH_ST, >>> CONFIG_SCSI_DH_RDAC, CONFIG_SCSI_DH_EMC and CONFIG_SCSI_DH_ALUA. >>> * Disable CONFIG_KASAN. Apparently there is an incompatibility between the >>> rdma_rxe driver and KASAN. I'm still analyzing this. >>> >>> Please let me know whether these changes also allow you to run the srp-test >>> software and whether you can reproduce what I reported at the start of this >>> e-mail thread. >>> >> >> Thanks for these new directives and sorry for my long delay. I've >> modified the config as per your suggestions (you can find my new >> config attached), and retried. >> >> Unfortunately, same failure: >> $ sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq >> Unloaded the ib_srpt kernel module >> Unloaded the rdma_rxe kernel module >> SoftRoCE network interfaces: rxe0 >> Zero-initializing /dev/ram0 ... done >> Zero-initializing /dev/ram1 ... done >> mkdir: impossibile creare la directory "021c:42ff:fe4c:fac9": Invalid >> argument >> Retrying with old port name format >> mkdir: impossibile creare la directory "0xfe80021c42fffe4cfac9": >> Invalid argument > > Hello Paolo, > Hi > With your kernel config and I/O scheduler "none" srp-test runs reliably > on my test setup. I tried with none too, but: $ sudo ./run_tests -c -d -r 10 -t 02-mq -e none [sudo] password di paolo: Unloaded the ib_srpt kernel module Unloaded the rdma_rxe kernel module SoftRoCE network interfaces: rxe0 insmod: ERROR: could not insert module /lib/modules/4.16.0-rc1+/kernel/drivers/infiniband/ulp/srpt/ib_srpt.ko: File exists > The result for the BFQ scheduler is available below. Thanks for pasting it. According to the stack trace, the cause of the problem may still be some missing initialization in request cloning, like the one I reported [1], a thread that you initiated as a consequence of a failure rather similar to the present one. Mike and Jens took care of solving that issue (which had more general implications than just driving BFQ crazy). Unfortunately I can't remember how that story ended, and I got somehow lost among threads while trying to reconstruct it. Mike, Jens, I guess you ended up making a fix; if so, do you have any idea about how your fix relates to this new (?) issue. This one occurs after an end_clone_request, instead of a dm_mq_queue_rq, like the previous one did. Or, more in general, does this issue ring any bell? [1] https://www.spinics.net/lists/dm-devel/msg32088.html > If > the srp-test software did not start on your setup I assume that you are > using another kernel version? Which kernel version did you use? > Still 4.16-rc1, being that the version for which you reported this issue in the first place. Thanks, Paolo > Thanks, > > Bart. > > > > > BUG: unable to handle kernel NULL pointer dereference at 0200 > IP: rb_erase+0x284/0x380 > PGD 0 P4D 0 > Oops: 0002 [#1] SMP PTI > Modules linked in: ib_srp libcrc32c scsi_transport_srp ib_srpt > target_core_iblock target_core_mod rdma_cm iw_cm ib_cm scsi_debug brd > rdma_rxe ip6_udp_tunnel udp_tunnel ib_umad ib_uverbs ib_core > kyber_iosched bfq crct10dif_pclmul crc32_pclmul ghash_clmulni_intel serio_raw > virtio_balloon virtio_console multipath virtio_net virtio_blk virtio_scsi > ata_generic crc32c_intel virtio_pci virtio_ring > virtio pata_acpi [last unloaded: ip6_udp_tunnel] > CPU: 3 PID: 28 Comm: ksoftirqd/3 Not tainted 4.16.0-rc7-dbg+ #2 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.0.0-prebuilt.qemu-project.org 04/01/2014 > RIP: 0010:rb_erase+0x284/0x380 > RSP: :a5ad0040f908 EFLAGS: 00010206 > RAX: de9f81e9b700 RBX: 9445775b1380 RCX: > RDX: de9f81e9b700 RSI: 9445652e1380 RDI: 9445775b13e0 > RBP: a5ad0040f908 R08: 0200 R09: 0002 > R10: 0001 R11: af25f020 R12: 9445775b13e0 > R13: 944564376800 R14: 944576328000 R15: 0001 > FS: () GS:94457fd8() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 0200 CR3: 6b210001 CR4: 003606e0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Call Trace: > elv_rb_del+0x24/0x30 > bfq_remove_request+0x9a/0x2e0 [bfq] > bfq_finish_requeue_request+0x2e1/0x3b0 [bfq] > blk_mq_free_request+0x5f/0x1a0 > blk_put_request+0x23/0x60 > multipath_release_clone+0xe/0x10 > dm_softirq_done+0xe3/0x270 > __blk_mq_complete_request+0xfd/0x190 > blk_
Re: v4.16-rc1 + dm-mpath + BFQ
On Thu, 2018-03-29 at 11:02 +0200, Paolo Valente wrote: > > Il giorno 01 mar 2018, alle ore 02:35, Bart Van Assche > > ha scritto: > > Thank you for having shared your kernel config off-list. After having > > made the following changes to your kernel config I was able to run the > > srp-test software: > > * Enable CONFIG_DM_MULTIPATH_QL, CONFIG_DM_MULTIPATH_ST, > > CONFIG_SCSI_DH_RDAC, CONFIG_SCSI_DH_EMC and CONFIG_SCSI_DH_ALUA. > > * Disable CONFIG_KASAN. Apparently there is an incompatibility between the > > rdma_rxe driver and KASAN. I'm still analyzing this. > > > > Please let me know whether these changes also allow you to run the srp-test > > software and whether you can reproduce what I reported at the start of this > > e-mail thread. > > > > Thanks for these new directives and sorry for my long delay. I've > modified the config as per your suggestions (you can find my new > config attached), and retried. > > Unfortunately, same failure: > $ sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq > Unloaded the ib_srpt kernel module > Unloaded the rdma_rxe kernel module > SoftRoCE network interfaces: rxe0 > Zero-initializing /dev/ram0 ... done > Zero-initializing /dev/ram1 ... done > mkdir: impossibile creare la directory "021c:42ff:fe4c:fac9": Invalid argument > Retrying with old port name format > mkdir: impossibile creare la directory "0xfe80021c42fffe4cfac9": > Invalid argument Hello Paolo, With your kernel config and I/O scheduler "none" srp-test runs reliably on my test setup. The result for the BFQ scheduler is available below. If the srp-test software did not start on your setup I assume that you are using another kernel version? Which kernel version did you use? Thanks, Bart. BUG: unable to handle kernel NULL pointer dereference at 0200 IP: rb_erase+0x284/0x380 PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI Modules linked in: ib_srp libcrc32c scsi_transport_srp ib_srpt target_core_iblock target_core_mod rdma_cm iw_cm ib_cm scsi_debug brd rdma_rxe ip6_udp_tunnel udp_tunnel ib_umad ib_uverbs ib_core kyber_iosched bfq crct10dif_pclmul crc32_pclmul ghash_clmulni_intel serio_raw virtio_balloon virtio_console multipath virtio_net virtio_blk virtio_scsi ata_generic crc32c_intel virtio_pci virtio_ring virtio pata_acpi [last unloaded: ip6_udp_tunnel] CPU: 3 PID: 28 Comm: ksoftirqd/3 Not tainted 4.16.0-rc7-dbg+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:rb_erase+0x284/0x380 RSP: :a5ad0040f908 EFLAGS: 00010206 RAX: de9f81e9b700 RBX: 9445775b1380 RCX: RDX: de9f81e9b700 RSI: 9445652e1380 RDI: 9445775b13e0 RBP: a5ad0040f908 R08: 0200 R09: 0002 R10: 0001 R11: af25f020 R12: 9445775b13e0 R13: 944564376800 R14: 944576328000 R15: 0001 FS: () GS:94457fd8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0200 CR3: 6b210001 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: elv_rb_del+0x24/0x30 bfq_remove_request+0x9a/0x2e0 [bfq] bfq_finish_requeue_request+0x2e1/0x3b0 [bfq] blk_mq_free_request+0x5f/0x1a0 blk_put_request+0x23/0x60 multipath_release_clone+0xe/0x10 dm_softirq_done+0xe3/0x270 __blk_mq_complete_request+0xfd/0x190 blk_mq_complete_request+0x69/0xa0 dm_complete_request+0x22/0x30 end_clone_request+0x1d/0x20 __blk_mq_end_request+0x5b/0x70 scsi_end_request+0xba/0x220 scsi_io_completion+0x4f1/0x700 ? scsi_dec_host_busy+0xa6/0x130 scsi_finish_command+0xef/0x140 scsi_softirq_done+0x11f/0x170 __blk_mq_complete_request+0xfd/0x190 blk_mq_complete_request+0x69/0xa0 scsi_mq_done+0x34/0x100 srp_recv_done+0x2f6/0xa40 [ib_srp] ? rxe_poll_cq+0x13a/0x150 [rdma_rxe] __ib_process_cq+0x83/0xc0 [ib_core] ib_poll_handler+0x2b/0x80 [ib_core] irq_poll_softirq+0x90/0x140 __do_softirq+0xcf/0x4b1 run_ksoftirqd+0x33/0x50 smpboot_thread_fn+0xfc/0x170 kthread+0x121/0x140 ? sort_range+0x30/0x30 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x3a/0x50 Code: 83 e2 01 0f 85 45 fe ff ff 5d c3 4c 89 0e 4d 85 d2 0f 84 28 fe ff ff 48 83 c8 01 48 89 0a 49 89 02 5d c3 4d 85 c0 4c 89 06 74 9c <49> 89 10 5d c3 48 89 0e 5d c3 4d 89 48 10 eb d3 4d 8b 50 08 4c RIP: rb_erase+0x284/0x380 RSP: a5ad0040f908 CR2: 0200 ---[ end trace 29e2f703ddaa3232 ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x2d00 from 0x8100 (relocation range: 0x8000-0xbfff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt
Re: v4.16-rc1 + dm-mpath + BFQ
On Fri, 2018-02-16 at 08:39 +0100, Paolo Valente wrote: > after enabling the listing options in your list, and a few other > related options, such iblock support, I get this: > > $ sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq > Unloaded the ib_srpt kernel module > Unloaded the rdma_rxe kernel module > SoftRoCE network interfaces: rxe0 > Zero-initializing /dev/ram0 ... done > Zero-initializing /dev/ram1 ... done > mkdir: impossibile creare la directory "021c:42ff:fe4c:fac9": Invalid argument > Retrying with old port name format > mkdir: impossibile creare la directory "0xfe80021c42fffe4cfac9": > Invalid argument Hello Paolo, Thank you for having shared your kernel config off-list. After having made the following changes to your kernel config I was able to run the srp-test software: * Enable CONFIG_DM_MULTIPATH_QL, CONFIG_DM_MULTIPATH_ST, CONFIG_SCSI_DH_RDAC, CONFIG_SCSI_DH_EMC and CONFIG_SCSI_DH_ALUA. * Disable CONFIG_KASAN. Apparently there is an incompatibility between the rdma_rxe driver and KASAN. I'm still analyzing this. Please let me know whether these changes also allow you to run the srp-test software and whether you can reproduce what I reported at the start of this e-mail thread. Thanks, Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
On Fri, 2018-02-16 at 08:39 +0100, Paolo Valente wrote: > after enabling the listing options in your list, and a few other > related options, such iblock support, I get this: > > $ sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq > Unloaded the ib_srpt kernel module > Unloaded the rdma_rxe kernel module > SoftRoCE network interfaces: rxe0 > Zero-initializing /dev/ram0 ... done > Zero-initializing /dev/ram1 ... done > mkdir: impossibile creare la directory "021c:42ff:fe4c:fac9": Invalid argument > Retrying with old port name format > mkdir: impossibile creare la directory "0xfe80021c42fffe4cfac9": > Invalid argument Hello Paolo, That probably means that there is still something missing from the kernel config that you are using. Please send that kernel-config to me (off-list) such that I can have a look at it. Thanks, Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
> Il giorno 14 feb 2018, alle ore 19:11, Bart Van Assche > ha scritto: > > On 02/14/18 09:55, Paolo Valente wrote: >> After following all of them (and taking some other step needed), I >> invoked: >> sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq >> But I got the following: >> ./lib/functions: riga 34: /sys/class/block/ram0/size: No such file or >> directory >> ./lib/functions: riga 34: * 512: errore di sintassi: atteso un operando (il >> token dell'errore è "* 512") >> Unloaded the ib_srpt kernel module >> Unloaded the rdma_rxe kernel module >> modprobe: FATAL: Module ib_uverbs not found in directory >> /lib/modules/4.16.0-rc1+ >> modprobe: FATAL: Module ib_umad not found in directory >> /lib/modules/4.16.0-rc1+ >> SoftRoCE network interfaces: rxe0 >> modprobe: FATAL: Module target_core_iblock not found in directory >> /lib/modules/4.16.0-rc1+ >> So I think I need a little more help, to have this working in a >> reasonable amount of time. In particular, could you tell me all what >> is missing? > > Hello Paolo, > > Can you check whether CONFIG_BLK_DEV_RAM, CONFIG_INFINIBAND, > CONFIG_INFINIBAND_USER_MAD, CONFIG_INFINIBAND_USER_ACCESS, > CONFIG_INFINIBAND_USER_MEM, CONFIG_INFINIBAND_IPOIB, CONFIG_INFINIBAND_SRP, > CONFIG_INFINIBAND_SRPT and CONFIG_RDMA_RXE were enabled in your kernel config? > (+Linus, Ulf) Hi Bart, after enabling the listing options in your list, and a few other related options, such iblock support, I get this: $ sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq Unloaded the ib_srpt kernel module Unloaded the rdma_rxe kernel module SoftRoCE network interfaces: rxe0 Zero-initializing /dev/ram0 ... done Zero-initializing /dev/ram1 ... done mkdir: impossibile creare la directory "021c:42ff:fe4c:fac9": Invalid argument Retrying with old port name format mkdir: impossibile creare la directory "0xfe80021c42fffe4cfac9": Invalid argument Thanks for your patience and collaboration, Paolo > Thanks, > > Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
On 02/14/18 09:55, Paolo Valente wrote: After following all of them (and taking some other step needed), I invoked: sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq But I got the following: ./lib/functions: riga 34: /sys/class/block/ram0/size: No such file or directory ./lib/functions: riga 34: * 512: errore di sintassi: atteso un operando (il token dell'errore è "* 512") Unloaded the ib_srpt kernel module Unloaded the rdma_rxe kernel module modprobe: FATAL: Module ib_uverbs not found in directory /lib/modules/4.16.0-rc1+ modprobe: FATAL: Module ib_umad not found in directory /lib/modules/4.16.0-rc1+ SoftRoCE network interfaces: rxe0 modprobe: FATAL: Module target_core_iblock not found in directory /lib/modules/4.16.0-rc1+ So I think I need a little more help, to have this working in a reasonable amount of time. In particular, could you tell me all what is missing? Hello Paolo, Can you check whether CONFIG_BLK_DEV_RAM, CONFIG_INFINIBAND, CONFIG_INFINIBAND_USER_MAD, CONFIG_INFINIBAND_USER_ACCESS, CONFIG_INFINIBAND_USER_MEM, CONFIG_INFINIBAND_IPOIB, CONFIG_INFINIBAND_SRP, CONFIG_INFINIBAND_SRPT and CONFIG_RDMA_RXE were enabled in your kernel config? Thanks, Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
> Il giorno 13 feb 2018, alle ore 19:47, Bart Van Assche > ha scritto: > > On Tue, 2018-02-13 at 19:38 +0100, Paolo Valente wrote: >> as a first attempt, I've followed your steps, but got: >> Error: could not find sg_reset > > Please install the sg3_utils package. Every Linux distro I know of supports > that > package. I happened to do this test on a Fedora. > And in case you would like to install it from source, the source code of > that package is available from http://sg.danny.cz/sg/sg3_utils.html. > >> For ib_srp-backport, I get a lot of warnings like the following one, >> at "make install" (preceded by corresponding warnings at the end of >> the compilation): >> depmod: WARNING: /lib/modules/4.16.0-rc1+/extra/ib_srp.ko needs unknown >> symbol rdma_resolve_addr >> >> Unfortunately, it gets worse while executing "make scst srpt": > > Please neither install the ib_srp-backport driver nor SCST. These drivers have > not yet been tested against kernel v4.16-rc1. I provided you a kernel tree in > which both the SRP initiator and target drivers support RoCE such that you > don't > need to install these out-of-tree drivers. I think all that you need from the > srp-test README document are the instructions to configure /etc/multipath.conf > and the instructions for installing the required packages. From that README > document: > > Install the following software packages if these have not yet been installed: > fio, gcc-c++, make, multipath-tools or device-mapper-multipath, sg3_utils, > srptools, e2fsprogs and xfsprogs. > Thank you very much for these instructions Bart. After following all of them (and taking some other step needed), I invoked: sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq But I got the following: ./lib/functions: riga 34: /sys/class/block/ram0/size: No such file or directory ./lib/functions: riga 34: * 512: errore di sintassi: atteso un operando (il token dell'errore è "* 512") Unloaded the ib_srpt kernel module Unloaded the rdma_rxe kernel module modprobe: FATAL: Module ib_uverbs not found in directory /lib/modules/4.16.0-rc1+ modprobe: FATAL: Module ib_umad not found in directory /lib/modules/4.16.0-rc1+ SoftRoCE network interfaces: rxe0 modprobe: FATAL: Module target_core_iblock not found in directory /lib/modules/4.16.0-rc1+ So I think I need a little more help, to have this working in a reasonable amount of time. In particular, could you tell me all what is missing? Thanks, Paolo > Thanks, > > Bart. > >
Re: v4.16-rc1 + dm-mpath + BFQ
On Tue, 2018-02-13 at 19:38 +0100, Paolo Valente wrote: > as a first attempt, I've followed your steps, but got: > Error: could not find sg_reset Please install the sg3_utils package. Every Linux distro I know of supports that package. And in case you would like to install it from source, the source code of that package is available from http://sg.danny.cz/sg/sg3_utils.html. > For ib_srp-backport, I get a lot of warnings like the following one, > at "make install" (preceded by corresponding warnings at the end of > the compilation): > depmod: WARNING: /lib/modules/4.16.0-rc1+/extra/ib_srp.ko needs unknown > symbol rdma_resolve_addr > > Unfortunately, it gets worse while executing "make scst srpt": Please neither install the ib_srp-backport driver nor SCST. These drivers have not yet been tested against kernel v4.16-rc1. I provided you a kernel tree in which both the SRP initiator and target drivers support RoCE such that you don't need to install these out-of-tree drivers. I think all that you need from the srp-test README document are the instructions to configure /etc/multipath.conf and the instructions for installing the required packages. From that README document: Install the following software packages if these have not yet been installed: fio, gcc-c++, make, multipath-tools or device-mapper-multipath, sg3_utils, srptools, e2fsprogs and xfsprogs. Thanks, Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
> Il giorno 12 feb 2018, alle ore 17:31, Bart Van Assche > ha scritto: > > On 02/11/18 23:35, Paolo Valente wrote: >> Also this smells a little bit like some spurious elevator call. >> Unfortunately I have no clue on the cause. To go on, I need at least >> to reproduce it. In this respect: Bart, could you please tell me how >> to setup the offending configuration, and to cause the failure? >> Possibly with just one, or at most two PCs. I don't have fancier hw >> at the moment. > > Hello Paolo, > > Although I expect that it is possible to reproduce this with an unmodified > v4.16-rc1 kernel, this is how I ran into this issue: > * Clone the for-next branch of https://github.com/bvanassche/linux. > * Build and install that kernel in a virtual machine. > * Clone https://github.com/bvanassche/srp-test. > * Run the following command: > srp-test/run_tests -c -d -r 10 -t 02-mq -e bfq > Hi Bart, as a first attempt, I've followed your steps, but got: Error: could not find sg_reset expectedly because of dependencies that you are implying in your steps. So, I have followed the instructions in the srp-test README for the case "Running the Tests on an Ethernet Setup", directly on a 4.16-rc1. For ib_srp-backport, I get a lot of warnings like the following one, at "make install" (preceded by corresponding warnings at the end of the compilation): depmod: WARNING: /lib/modules/4.16.0-rc1+/extra/ib_srp.ko needs unknown symbol rdma_resolve_addr Unfortunately, it gets worse while executing "make scst srpt": CC [M] /home/paolo/scst/srpt/src/ib_srpt.o In file included from /home/paolo/scst/srpt/src/ib_srpt.c:62:0: /home/paolo/scst/srpt/src/ib_srpt.h:481:8: error: redefinition of ‘struct srp_login_req_rdma’ struct srp_login_req_rdma { ^~ In file included from /home/paolo/scst/srpt/src/ib_srpt.h:44:0, from /home/paolo/scst/srpt/src/ib_srpt.c:62: /mnt/linux-dev/linux/include/scsi/srp.h:139:8: note: originally defined here struct srp_login_req_rdma { ^~ Could you please give me some help, so as to not get lost among these issues? Thanks, Paolo > Thanks, > > Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
On 02/11/18 23:35, Paolo Valente wrote: Also this smells a little bit like some spurious elevator call. Unfortunately I have no clue on the cause. To go on, I need at least to reproduce it. In this respect: Bart, could you please tell me how to setup the offending configuration, and to cause the failure? Possibly with just one, or at most two PCs. I don't have fancier hw at the moment. Hello Paolo, Although I expect that it is possible to reproduce this with an unmodified v4.16-rc1 kernel, this is how I ran into this issue: * Clone the for-next branch of https://github.com/bvanassche/linux. * Build and install that kernel in a virtual machine. * Clone https://github.com/bvanassche/srp-test. * Run the following command: srp-test/run_tests -c -d -r 10 -t 02-mq -e bfq Thanks, Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
> Il giorno 09 feb 2018, alle ore 20:18, Jens Axboe ha > scritto: > > On 2/9/18 12:14 PM, Bart Van Assche wrote: >> On 02/09/18 10:58, Jens Axboe wrote: >>> On 2/9/18 11:54 AM, Bart Van Assche wrote: Hello Paolo, If I enable the BFQ scheduler for a dm-mpath device then a kernel oops appears (see also below). This happens systematically with Linus' tree from this morning (commit 54ce685cae30) merged with Jens' for-linus branch (commit a78773906147 ("block, bfq: add requeue-request hook")) and for-next branch (commit 88455ad7f928). Is this a known issue? >>> >>> Does it happen on Linus -git as well, or just with my for-linus merged in? >>> What I'm getting at is if a78773906147 caused this or not. >> >> Hello Jens, >> >> Thanks for chiming in. After having reverted commit a78773906147, after >> having rebuilt the BFQ scheduler, after having rebooted and after having >> repeated the test I see the same kernel oops being reported. I think >> that means that this regression is not caused by commit a78773906147. In >> case it would be useful, here is how gdb translates the crash address: >> >> $ gdb block/bfq*ko >> (gdb) list *(bfq_remove_request+0x8d) >> 0x280d is in bfq_remove_request (block/bfq-iosched.c:1760). >> 1755list_del_init(&rq->queuelist); >> 1756bfqq->queued[sync]--; >> 1757bfqd->queued--; >> 1758elv_rb_del(&bfqq->sort_list, rq); >> 1759 >> 1760elv_rqhash_del(q, rq); >> 1761if (q->last_merge == rq) >> 1762q->last_merge = NULL; >> 1763 >> 1764if (RB_EMPTY_ROOT(&bfqq->sort_list)) { > > Looks very odd. So clearly RQF_HASHED is set, but we're blowing up on > the hash list pointers. I'll let Paolo take a look at this one. Thanks > for testing without that commit, I want to push out my pending fixes > today and this would have thrown a wrench in the works. > Also this smells a little bit like some spurious elevator call. Unfortunately I have no clue on the cause. To go on, I need at least to reproduce it. In this respect: Bart, could you please tell me how to setup the offending configuration, and to cause the failure? Possibly with just one, or at most two PCs. I don't have fancier hw at the moment. Thanks, Paolo > -- > Jens Axboe
Re: v4.16-rc1 + dm-mpath + BFQ
On 2/9/18 12:14 PM, Bart Van Assche wrote: > On 02/09/18 10:58, Jens Axboe wrote: >> On 2/9/18 11:54 AM, Bart Van Assche wrote: >>> Hello Paolo, >>> >>> If I enable the BFQ scheduler for a dm-mpath device then a kernel oops >>> appears (see also below). This happens systematically with Linus' tree from >>> this morning (commit 54ce685cae30) merged with Jens' for-linus branch >>> (commit >>> a78773906147 ("block, bfq: add requeue-request hook")) and for-next branch >>> (commit 88455ad7f928). Is this a known issue? >> >> Does it happen on Linus -git as well, or just with my for-linus merged in? >> What I'm getting at is if a78773906147 caused this or not. > > Hello Jens, > > Thanks for chiming in. After having reverted commit a78773906147, after > having rebuilt the BFQ scheduler, after having rebooted and after having > repeated the test I see the same kernel oops being reported. I think > that means that this regression is not caused by commit a78773906147. In > case it would be useful, here is how gdb translates the crash address: > > $ gdb block/bfq*ko > (gdb) list *(bfq_remove_request+0x8d) > 0x280d is in bfq_remove_request (block/bfq-iosched.c:1760). > 1755list_del_init(&rq->queuelist); > 1756bfqq->queued[sync]--; > 1757bfqd->queued--; > 1758elv_rb_del(&bfqq->sort_list, rq); > 1759 > 1760elv_rqhash_del(q, rq); > 1761if (q->last_merge == rq) > 1762q->last_merge = NULL; > 1763 > 1764if (RB_EMPTY_ROOT(&bfqq->sort_list)) { Looks very odd. So clearly RQF_HASHED is set, but we're blowing up on the hash list pointers. I'll let Paolo take a look at this one. Thanks for testing without that commit, I want to push out my pending fixes today and this would have thrown a wrench in the works. -- Jens Axboe
Re: v4.16-rc1 + dm-mpath + BFQ
On 02/09/18 10:58, Jens Axboe wrote: On 2/9/18 11:54 AM, Bart Van Assche wrote: Hello Paolo, If I enable the BFQ scheduler for a dm-mpath device then a kernel oops appears (see also below). This happens systematically with Linus' tree from this morning (commit 54ce685cae30) merged with Jens' for-linus branch (commit a78773906147 ("block, bfq: add requeue-request hook")) and for-next branch (commit 88455ad7f928). Is this a known issue? Does it happen on Linus -git as well, or just with my for-linus merged in? What I'm getting at is if a78773906147 caused this or not. Hello Jens, Thanks for chiming in. After having reverted commit a78773906147, after having rebuilt the BFQ scheduler, after having rebooted and after having repeated the test I see the same kernel oops being reported. I think that means that this regression is not caused by commit a78773906147. In case it would be useful, here is how gdb translates the crash address: $ gdb block/bfq*ko (gdb) list *(bfq_remove_request+0x8d) 0x280d is in bfq_remove_request (block/bfq-iosched.c:1760). 1755list_del_init(&rq->queuelist); 1756bfqq->queued[sync]--; 1757bfqd->queued--; 1758elv_rb_del(&bfqq->sort_list, rq); 1759 1760elv_rqhash_del(q, rq); 1761if (q->last_merge == rq) 1762q->last_merge = NULL; 1763 1764if (RB_EMPTY_ROOT(&bfqq->sort_list)) { Bart.
Re: v4.16-rc1 + dm-mpath + BFQ
On 2/9/18 11:54 AM, Bart Van Assche wrote: > Hello Paolo, > > If I enable the BFQ scheduler for a dm-mpath device then a kernel oops > appears (see also below). This happens systematically with Linus' tree from > this morning (commit 54ce685cae30) merged with Jens' for-linus branch (commit > a78773906147 ("block, bfq: add requeue-request hook")) and for-next branch > (commit 88455ad7f928). Is this a known issue? Does it happen on Linus -git as well, or just with my for-linus merged in? What I'm getting at is if a78773906147 caused this or not. -- Jens Axboe