Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On Thu, Jan 11, 2018 at 06:46:54PM +0100, Christoph Hellwig wrote: > Thanks for looking into this Ming, I had missed it in the my current > work overload. Can you send the updated series to Jens? OK, I will post it out soon. Thanks, Ming
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
Thanks for looking into this Ming, I had missed it in the my current work overload. Can you send the updated series to Jens?
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11.01.2018 12:44, Christian Borntraeger wrote: On 01/11/2018 10:13 AM, Ming Lei wrote: On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote: On 12/18/2017 02:56 PM, Stefan Haberland wrote: On 07.12.2017 00:29, Christoph Hellwig wrote: On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad blk-mq: create a blk_mq_ctx for each possible CPU does not boot on DASD and commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good genirq/affinity: assign vectors to all possible CPUs does boot with DASD disks. Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the s390 irq handling code). That is interesting as it really isn't related to interrupts at all, it just ensures that possible CPUs are set in ->cpumask. I guess we'd really want: e005655c389e3d25bf3e43f71611ec12f3012de0 "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" before this commit, but it seems like the whole stack didn't work for your either. I wonder if there is some weird thing about nr_cpu_ids in s390? -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well. The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works. But at some point in time the disk do not get any requests. I currently have no clue why. I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests. Do you have anything I could have a look at? Jens, Christoph, so what do we do about this? To summarize: - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug. - Jens' quick revert did fix the issue and did not broke DASD support but has some issues with interrupt affinity. - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even without hotplug). Hello, This one is a valid use case for VM, I think we need to fix that. Looks there is issue on the fouth patch("blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and the other 3 patches are same with Christoph's: https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix gitweb: https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix Could you test it and provide the feedback? BTW, if it can't help this issue, could you boot from a normal disk first and dump blk-mq debugfs of DASD later? That kernel seems to boot fine on my system with DASD disks. -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I did some regression testing and it works quite well. Boot works, attaching CPUs during runtime on z/VM and enabling them in Linux works as well. I also did some DASD online/offline CPU enable/disable loops. Regards, Stefan
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 01/11/2018 10:13 AM, Ming Lei wrote: > On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote: >> On 12/18/2017 02:56 PM, Stefan Haberland wrote: >>> On 07.12.2017 00:29, Christoph Hellwig wrote: On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad > blk-mq: create a blk_mq_ctx for each possible CPU > does not boot on DASD and > commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good > genirq/affinity: assign vectors to all possible CPUs > does boot with DASD disks. > > Also adding Stefan Haberland if he has an idea why this fails on DASD and > adding Martin (for the > s390 irq handling code). That is interesting as it really isn't related to interrupts at all, it just ensures that possible CPUs are set in ->cpumask. I guess we'd really want: e005655c389e3d25bf3e43f71611ec12f3012de0 "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" before this commit, but it seems like the whole stack didn't work for your either. I wonder if there is some weird thing about nr_cpu_ids in s390? -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot >>> for me as well. >>> The disks get up and running and I/O works fine. At least the partition >>> detection and EXT4-fs mount works. >>> >>> But at some point in time the disk do not get any requests. >>> >>> I currently have no clue why. >>> I took a dump and had a look at the disk states and they are fine. No error >>> in the logs or in our debug entrys. Just empty DASD devices waiting to be >>> called for I/O requests. >>> >>> Do you have anything I could have a look at? >> >> Jens, Christoph, so what do we do about this? >> To summarize: >> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke >> CPU hotplug. >> - Jens' quick revert did fix the issue and did not broke DASD support but >> has some issues >> with interrupt affinity. >> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O >> hangs on DASDs (even >> without hotplug). > > Hello, > > This one is a valid use case for VM, I think we need to fix that. > > Looks there is issue on the fouth patch("blk-mq: only select online > CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and > the other 3 patches are same with Christoph's: > > https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix > > gitweb: > > https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix > > Could you test it and provide the feedback? > > BTW, if it can't help this issue, could you boot from a normal disk first > and dump blk-mq debugfs of DASD later? That kernel seems to boot fine on my system with DASD disks.
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11.01.2018 10:13, Ming Lei wrote: On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote: On 12/18/2017 02:56 PM, Stefan Haberland wrote: On 07.12.2017 00:29, Christoph Hellwig wrote: On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad blk-mq: create a blk_mq_ctx for each possible CPU does not boot on DASD and commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good genirq/affinity: assign vectors to all possible CPUs does boot with DASD disks. Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the s390 irq handling code). That is interesting as it really isn't related to interrupts at all, it just ensures that possible CPUs are set in ->cpumask. I guess we'd really want: e005655c389e3d25bf3e43f71611ec12f3012de0 "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" before this commit, but it seems like the whole stack didn't work for your either. I wonder if there is some weird thing about nr_cpu_ids in s390? -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well. The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works. But at some point in time the disk do not get any requests. I currently have no clue why. I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests. Do you have anything I could have a look at? Jens, Christoph, so what do we do about this? To summarize: - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug. - Jens' quick revert did fix the issue and did not broke DASD support but has some issues with interrupt affinity. - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even without hotplug). Hello, This one is a valid use case for VM, I think we need to fix that. Looks there is issue on the fouth patch("blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and the other 3 patches are same with Christoph's: https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix gitweb: https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix Could you test it and provide the feedback? BTW, if it can't help this issue, could you boot from a normal disk first and dump blk-mq debugfs of DASD later? Thanks, Ming Hi, thanks for the patch. I had pretty much the same place in suspicion. I will test it asap. Regards, Stefan
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote: > On 12/18/2017 02:56 PM, Stefan Haberland wrote: > > On 07.12.2017 00:29, Christoph Hellwig wrote: > >> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: > >> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad > >>> blk-mq: create a blk_mq_ctx for each possible CPU > >>> does not boot on DASD and > >>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good > >>> genirq/affinity: assign vectors to all possible CPUs > >>> does boot with DASD disks. > >>> > >>> Also adding Stefan Haberland if he has an idea why this fails on DASD and > >>> adding Martin (for the > >>> s390 irq handling code). > >> That is interesting as it really isn't related to interrupts at all, > >> it just ensures that possible CPUs are set in ->cpumask. > >> > >> I guess we'd really want: > >> > >> e005655c389e3d25bf3e43f71611ec12f3012de0 > >> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" > >> > >> before this commit, but it seems like the whole stack didn't work for > >> your either. > >> > >> I wonder if there is some weird thing about nr_cpu_ids in s390? > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-s390" in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > I tried this on my system and the blk-mq-hotplug-fix branch does not boot > > for me as well. > > The disks get up and running and I/O works fine. At least the partition > > detection and EXT4-fs mount works. > > > > But at some point in time the disk do not get any requests. > > > > I currently have no clue why. > > I took a dump and had a look at the disk states and they are fine. No error > > in the logs or in our debug entrys. Just empty DASD devices waiting to be > > called for I/O requests. > > > > Do you have anything I could have a look at? > > Jens, Christoph, so what do we do about this? > To summarize: > - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke > CPU hotplug. > - Jens' quick revert did fix the issue and did not broke DASD support but has > some issues > with interrupt affinity. > - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O > hangs on DASDs (even > without hotplug). Hello, This one is a valid use case for VM, I think we need to fix that. Looks there is issue on the fouth patch("blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and the other 3 patches are same with Christoph's: https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix gitweb: https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix Could you test it and provide the feedback? BTW, if it can't help this issue, could you boot from a normal disk first and dump blk-mq debugfs of DASD later? Thanks, Ming
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 12/18/2017 02:56 PM, Stefan Haberland wrote: > On 07.12.2017 00:29, Christoph Hellwig wrote: >> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: >> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad >>> blk-mq: create a blk_mq_ctx for each possible CPU >>> does not boot on DASD and >>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good >>> genirq/affinity: assign vectors to all possible CPUs >>> does boot with DASD disks. >>> >>> Also adding Stefan Haberland if he has an idea why this fails on DASD and >>> adding Martin (for the >>> s390 irq handling code). >> That is interesting as it really isn't related to interrupts at all, >> it just ensures that possible CPUs are set in ->cpumask. >> >> I guess we'd really want: >> >> e005655c389e3d25bf3e43f71611ec12f3012de0 >> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" >> >> before this commit, but it seems like the whole stack didn't work for >> your either. >> >> I wonder if there is some weird thing about nr_cpu_ids in s390? >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-s390" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > I tried this on my system and the blk-mq-hotplug-fix branch does not boot for > me as well. > The disks get up and running and I/O works fine. At least the partition > detection and EXT4-fs mount works. > > But at some point in time the disk do not get any requests. > > I currently have no clue why. > I took a dump and had a look at the disk states and they are fine. No error > in the logs or in our debug entrys. Just empty DASD devices waiting to be > called for I/O requests. > > Do you have anything I could have a look at? Jens, Christoph, so what do we do about this? To summarize: - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug. - Jens' quick revert did fix the issue and did not broke DASD support but has some issues with interrupt affinity. - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even without hotplug). Christian
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 07.12.2017 00:29, Christoph Hellwig wrote: On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: t > commit 11b2025c3326f7096ceb588c3117c7883850c068-> bad blk-mq: create a blk_mq_ctx for each possible CPU does not boot on DASD and commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc-> good genirq/affinity: assign vectors to all possible CPUs does boot with DASD disks. Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the s390 irq handling code). That is interesting as it really isn't related to interrupts at all, it just ensures that possible CPUs are set in ->cpumask. I guess we'd really want: e005655c389e3d25bf3e43f71611ec12f3012de0 "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" before this commit, but it seems like the whole stack didn't work for your either. I wonder if there is some weird thing about nr_cpu_ids in s390? -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well. The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works. But at some point in time the disk do not get any requests. I currently have no clue why. I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests. Do you have anything I could have a look at?
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
Independent from the issues with the dasd disks, this also seem to not enable additional hardware queues. with cpus 0,1 (and 248 cpus max) I get cpus 0 and 2-247 attached to hardware contect 0 and I get cpu 1 for hardware context 1. If I now add a cpu this does not change anything. hardware context 2,3,4 etc all have no CPU and hardware context 0 keeps sitting on all cpus (except 1). On 12/07/2017 10:20 AM, Christian Borntraeger wrote: > > > On 12/07/2017 12:29 AM, Christoph Hellwig wrote: >> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: >> t > commit 11b2025c3326f7096ceb588c3117c7883850c068-> bad >>> blk-mq: create a blk_mq_ctx for each possible CPU >>> does not boot on DASD and >>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc-> good >>>genirq/affinity: assign vectors to all possible CPUs >>> does boot with DASD disks. >>> >>> Also adding Stefan Haberland if he has an idea why this fails on DASD and >>> adding Martin (for the >>> s390 irq handling code). >> >> That is interesting as it really isn't related to interrupts at all, >> it just ensures that possible CPUs are set in ->cpumask. >> >> I guess we'd really want: >> >> e005655c389e3d25bf3e43f71611ec12f3012de0 >> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" >> >> before this commit, but it seems like the whole stack didn't work for >> your either. >> >> I wonder if there is some weird thing about nr_cpu_ids in s390? > > The problem starts as soon as NR_CPUS is larger than the number > of real CPUs. > > Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more > than 1 non-online cpu: > > e.g. dont we need something like (whitespace and indent damaged) > > @@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx > *hctx) > if (--hctx->next_cpu_batch <= 0) { > int next_cpu; > > + do { > next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask); > - if (!cpu_online(next_cpu)) > - next_cpu = cpumask_next(next_cpu, hctx->cpumask); > if (next_cpu >= nr_cpu_ids) > next_cpu = cpumask_first(hctx->cpumask); > + } while (!cpu_online(next_cpu)); > > hctx->next_cpu = next_cpu; > hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > > it does not fix the issue, though (and it would be pretty inefficient for > large NR_CPUS) > >
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 12/07/2017 12:29 AM, Christoph Hellwig wrote: > On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: > t > commit 11b2025c3326f7096ceb588c3117c7883850c068-> bad >> blk-mq: create a blk_mq_ctx for each possible CPU >> does not boot on DASD and >> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc-> good >>genirq/affinity: assign vectors to all possible CPUs >> does boot with DASD disks. >> >> Also adding Stefan Haberland if he has an idea why this fails on DASD and >> adding Martin (for the >> s390 irq handling code). > > That is interesting as it really isn't related to interrupts at all, > it just ensures that possible CPUs are set in ->cpumask. > > I guess we'd really want: > > e005655c389e3d25bf3e43f71611ec12f3012de0 > "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" > > before this commit, but it seems like the whole stack didn't work for > your either. > > I wonder if there is some weird thing about nr_cpu_ids in s390? The problem starts as soon as NR_CPUS is larger than the number of real CPUs. Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu: e.g. dont we need something like (whitespace and indent damaged) @@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) if (--hctx->next_cpu_batch <= 0) { int next_cpu; + do { next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask); - if (!cpu_online(next_cpu)) - next_cpu = cpumask_next(next_cpu, hctx->cpumask); if (next_cpu >= nr_cpu_ids) next_cpu = cpumask_first(hctx->cpumask); + } while (!cpu_online(next_cpu)); hctx->next_cpu = next_cpu; hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: t > commit 11b2025c3326f7096ceb588c3117c7883850c068-> bad > blk-mq: create a blk_mq_ctx for each possible CPU > does not boot on DASD and > commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc-> good >genirq/affinity: assign vectors to all possible CPUs > does boot with DASD disks. > > Also adding Stefan Haberland if he has an idea why this fails on DASD and > adding Martin (for the > s390 irq handling code). That is interesting as it really isn't related to interrupts at all, it just ensures that possible CPUs are set in ->cpumask. I guess we'd really want: e005655c389e3d25bf3e43f71611ec12f3012de0 "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" before this commit, but it seems like the whole stack didn't work for your either. I wonder if there is some weird thing about nr_cpu_ids in s390?
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 12/04/2017 05:21 PM, Christoph Hellwig wrote: > On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote: >> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR. >> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot >> disk. >> Seems that this is the place where the system stops. (see the sysrq-t output >> at the bottom). > > Can you check which of the patches in the tree is the culprit? >From this branch git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix commit 11b2025c3326f7096ceb588c3117c7883850c068-> bad blk-mq: create a blk_mq_ctx for each possible CPU does not boot on DASD and commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc-> good genirq/affinity: assign vectors to all possible CPUs does boot with DASD disks. Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the s390 irq handling code). Some history: I got this warning "WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)" since 4.13 (and also in 4.12 stable) on CPU hotplug of previously unavailable CPUs (real hotplug, no offline/online) This was introduced with blk-mq: Create hctx for each present CPU commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 And Christoph is currently working on a fix. The fixed kernel does boot with virtio-blk and it fixes the warning but it hangs (outstanding I/O) with dasd disks.
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote: > Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR. > FWIW, the system not only has scsi disks via fcp but also DASDs as a boot > disk. > Seems that this is the place where the system stops. (see the sysrq-t output > at the bottom). Can you check which of the patches in the tree is the culprit?
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/29/2017 08:18 PM, Christian Borntraeger wrote: > Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR. > FWIW, the system not only has scsi disks via fcp but also DASDs as a boot > disk. > Seems that this is the place where the system stops. (see the sysrq-t output > at the bottom). FWIW, the failing kernel had CONFIG_NR_CPUS=256 and 32 CPUS (with SMT2) == 64 threads with CONFIG_NR_CPUS=16 the system booted fine.
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR. FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk. Seems that this is the place where the system stops. (see the sysrq-t output at the bottom). Message "[0.247484] Linux version 4.15.0-rc1+ (cborntra@s38lp08) (gcc version 6.3.1 2 " "0161221 (Red Hat 6.3.1-1.0.ibm) (GCC)) #229 SMP Wed Nov 29 20:05:35 CET 2017 " "[0.247489] setup: Linux is running natively in 64-bit mode " "[0.247661] setup: The maximum memory size is 1048576MB " "[0.247670] setup: Reserving 1024MB of memory at 1047552MB for crashkernel (S " "ystem RAM: 1047552MB) " "[0.247688] numa: NUMA mode: plain " "[0.247794] cpu: 64 configured CPUs, 0 standby CPUs " "[0.247834] cpu: The CPU configuration topology of the machine is: 0 0 4 2 3 " "8 / 4 " "[0.248279] Write protected kernel read-only data: 12456k " "[0.265131] Zone ranges: " "[0.265134] DMA [mem 0x-0x7fff] " "[0.265136] Normal [mem 0x8000-0x00ff] " "[0.265137] Movable zone start for each node " "[0.265138] Early memory node ranges " "[0.265139] node 0: [mem 0x-0x00ff] " "[0.265141] Initmem setup node 0 [mem 0x-0x00ff] " "[7.445561] random: fast init done " "[7.449194] percpu: Embedded 23 pages/cpu @00fbbe60 s56064 r8192 d299 " "52 u94208 " "[7.449380] Built 1 zonelists, mobility grouping on. Total pages: 264241152 " "[7.449381] Policy zone: Normal " "[7.449384] Kernel command line: elevator=deadline audit_enable=0 audit=0 aud " "it_debug=0 selinux=0 crashkernel=1024M printk.time=1 zfcp.dbfsize=100 dasd=241c, " "241d,241e,241f root=/dev/dasda1 kvm.nested=1 BOOT_IMAGE=0 " "[7.449420] audit: disabled (until reboot) " "[7.450513] log_buf_len individual max cpu contribution: 4096 bytes " "[7.450514] log_buf_len total cpu_extra contributions: 1044480 bytes " "[7.450515] log_buf_len min size: 131072 bytes " "[7.450788] log_buf_len: 2097152 bytes " "[7.450789] early log buf free: 125076(95%) " "[ 11.040620] Memory: 1055873868K/1073741824K available (8248K kernel code, 107 " "8K rwdata, 4204K rodata, 812K init, 700K bss, 17867956K reserved, 0K cma-reserve " "d) " "[ 11.040938] SLUB: HWalign=256, Order=0-3, MinObjects=0, CPUs=256, Nodes=1 " "[ 11.040969] ftrace: allocating 26506 entries in 104 pages " "[ 11.051476] Hierarchical RCU implementation. " "[ 11.051476] RCU event tracing is enabled. " "[ 11.051478] RCU debug extended QS entry/exit. " "[ 11.053263] NR_IRQS: 3, nr_irqs: 3, preallocated irqs: 3 " "[ 11.053444] clocksource: tod: mask: 0x max_cycles: 0x3b0a9be8 " "03b0a9, max_idle_ns: 1805497147909793 ns " "[ 11.160192] console [ttyS0] enabled " "[ 11.308228] pid_max: default: 262144 minimum: 2048 " "[ 11.308298] Security Framework initialized " "[ 11.308300] SELinux: Disabled at boot. " "[ 11.354028] Dentry cache hash table entries: 33554432 (order: 16, 268435456 b " "ytes) " "[ 11.376945] Inode-cache hash table entries: 16777216 (order: 15, 134217728 by " "tes) " "[ 11.377685] Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes) " "[ 11.378401] Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 b " "ytes) " "[ 11.378984] Hierarchical SRCU implementation. " "[ 11.380032] smp: Bringing up secondary CPUs ... " "[ 11.393634] smp: Brought up 1 node, 64 CPUs " "[ 11.585458] devtmpfs: initialized " "[ 11.588589] clocksource: jiffies: mask: 0x max_cycles: 0x, ma " "x_idle_ns: 1911260446275 ns " "[ 11.588998] futex hash table entries: 65536 (order: 12, 16777216 bytes) " "[ 11.591926] NET: Registered protocol family 16 " "[ 11.596413] HugeTLB registered 1.00 MiB page size, pre-allocated 0 pages " "[ 11.597604] SCSI subsystem initialized " "[ 11.597611] pps_core: LinuxPPS API ver. 1 registered " "[ 11.597612] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giome " "tti " "[ 11.597614] PTP clock support registered " "[ 11.599088] NetLabel: Initializing " "[ 11.599089] NetLabel: domain hash size = 128 " "[ 11.599090] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO " "[ 11.599101] NetLabel: unlabeled traffic allowed by default " "[ 11.612542] PCI host bridge to bus :00 " "[ 11.612546] pci_bus :00: root bus resource [mem 0x8000-0x8000 " "007f 64bit pref] " "[ 11.612548] pci_bus :00: No busn resource found for root bus, will use [b " "us 00-ff] " "[ 11.616458] iommu: Adding device :00:00.0 to group 0 " "[ 12.291894] VFS: Disk quotas dquot_6.6.0 " "[ 12.291942] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) " "[ 12.292226] NET: Registered protocol family 2 " "[ 12.292662] TCP established hash table entries: 524288 (order: 10, 4194304 by " "t
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
Can you try this git branch: git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix Gitweb: http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-mq-hotplug-fix
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/23/2017 07:59 PM, Christian Borntraeger wrote: > > > On 11/23/2017 07:32 PM, Christoph Hellwig wrote: >> On Thu, Nov 23, 2017 at 07:28:31PM +0100, Christian Borntraeger wrote: >>> zfcp on s390. >> >> Ok, so it can't be the interrupt code, but probably is the blk-mq-cpumap.c >> changes. Can you try to revert just those for a quick test? > > > Hmm, I get further in boot, but the system seems very sluggish and it does not > seem to be able to access the scsi disks (get data from them) > FWIW, just having the changes in irq_affinity.c is indeed fine.
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/23/2017 07:32 PM, Christoph Hellwig wrote: > On Thu, Nov 23, 2017 at 07:28:31PM +0100, Christian Borntraeger wrote: >> zfcp on s390. > > Ok, so it can't be the interrupt code, but probably is the blk-mq-cpumap.c > changes. Can you try to revert just those for a quick test? Hmm, I get further in boot, but the system seems very sluggish and it does not seem to be able to access the scsi disks (get data from them)
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On Thu, Nov 23, 2017 at 07:28:31PM +0100, Christian Borntraeger wrote: > zfcp on s390. Ok, so it can't be the interrupt code, but probably is the blk-mq-cpumap.c changes. Can you try to revert just those for a quick test?
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
zfcp on s390. On 11/23/2017 07:25 PM, Christoph Hellwig wrote: > What HBA driver do you use in the host? >
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
What HBA driver do you use in the host?
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/23/2017 03:34 PM, Christoph Hellwig wrote: > FYI, the patch below changes both the irq and block mappings to > always use the cpu possible map (should be split in two in due time). > > I think this is the right way forward. For every normal machine > those two are the same, but for VMs with maxcpus above their normal > count or some big iron that can grow more cpus it means we waster > a few more resources for the not present but reserved cpus. It > fixes the reported issue for me: While it fixes the hotplug issue under KVM, the same kernel no longers boots in the host, it seems stuck early at boot just before detecting the SCSI disks. I have not yet looked into that. Christian > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > index 9f8cffc8a701..3eb169f15842 100644 > --- a/block/blk-mq-cpumap.c > +++ b/block/blk-mq-cpumap.c > @@ -16,11 +16,6 @@ > > static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > { > - /* > - * Non present CPU will be mapped to queue index 0. > - */ > - if (!cpu_present(cpu)) > - return 0; > return cpu % nr_queues; > } > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 11097477eeab..612ce1fb7c4e 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct > request_queue *q, > INIT_LIST_HEAD(&__ctx->rq_list); > __ctx->queue = q; > > - /* If the cpu isn't present, the cpu is mapped to first hctx */ > - if (!cpu_present(i)) > - continue; > - > - hctx = blk_mq_map_queue(q, i); > - > /* >* Set local node, IFF we have more than one hw queue. If >* not, we remain on the home node of the device >*/ > + hctx = blk_mq_map_queue(q, i); > if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE) > hctx->numa_node = local_memory_node(cpu_to_node(i)); > } > @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q) >* >* If the cpu isn't present, the cpu is mapped to first hctx. >*/ > - for_each_present_cpu(i) { > + for_each_possible_cpu(i) { > hctx_idx = q->mq_map[i]; > /* unmapped hw queue can be remapped after CPU topo changed */ > if (!set->tags[hctx_idx] && > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c > index e12d35108225..a37a3b4b6342 100644 > --- a/kernel/irq/affinity.c > +++ b/kernel/irq/affinity.c > @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, > struct cpumask *nmsk, > } > } > > -static cpumask_var_t *alloc_node_to_present_cpumask(void) > +static cpumask_var_t *alloc_node_to_possible_cpumask(void) > { > cpumask_var_t *masks; > int node; > @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void) > return NULL; > } > > -static void free_node_to_present_cpumask(cpumask_var_t *masks) > +static void free_node_to_possible_cpumask(cpumask_var_t *masks) > { > int node; > > @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t > *masks) > kfree(masks); > } > > -static void build_node_to_present_cpumask(cpumask_var_t *masks) > +static void build_node_to_possible_cpumask(cpumask_var_t *masks) > { > int cpu; > > - for_each_present_cpu(cpu) > + for_each_possible_cpu(cpu) > cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]); > } > > -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask, > +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask, > const struct cpumask *mask, nodemask_t *nodemsk) > { > int n, nodes = 0; > > /* Calculate the number of nodes in the supplied affinity mask */ > for_each_node(n) { > - if (cpumask_intersects(mask, node_to_present_cpumask[n])) { > + if (cpumask_intersects(mask, node_to_possible_cpumask[n])) { > node_set(n, *nodemsk); > nodes++; > } > @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct > irq_affinity *affd) > int last_affv = affv + affd->pre_vectors; > nodemask_t nodemsk = NODE_MASK_NONE; > struct cpumask *masks; > - cpumask_var_t nmsk, *node_to_present_cpumask; > + cpumask_var_t nmsk, *node_to_possible_cpumask; > > /* >* If there aren't any vectors left after applying the pre/post > @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct > irq_affinity *affd) > if (!masks) > goto out; > > - node_to_present_cpumask = alloc_node_to_present_cpumask(); > - if (!node_to_present_cpumask) > + node_to_possible_cpumask = alloc_node_to_possible_cpumask(); > + if (!node_to_possible_cpumask) >
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
Yes it seems to fix the bug. On 11/23/2017 03:34 PM, Christoph Hellwig wrote: > FYI, the patch below changes both the irq and block mappings to > always use the cpu possible map (should be split in two in due time). > > I think this is the right way forward. For every normal machine > those two are the same, but for VMs with maxcpus above their normal > count or some big iron that can grow more cpus it means we waster > a few more resources for the not present but reserved cpus. It > fixes the reported issue for me: > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > index 9f8cffc8a701..3eb169f15842 100644 > --- a/block/blk-mq-cpumap.c > +++ b/block/blk-mq-cpumap.c > @@ -16,11 +16,6 @@ > > static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > { > - /* > - * Non present CPU will be mapped to queue index 0. > - */ > - if (!cpu_present(cpu)) > - return 0; > return cpu % nr_queues; > } > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 11097477eeab..612ce1fb7c4e 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct > request_queue *q, > INIT_LIST_HEAD(&__ctx->rq_list); > __ctx->queue = q; > > - /* If the cpu isn't present, the cpu is mapped to first hctx */ > - if (!cpu_present(i)) > - continue; > - > - hctx = blk_mq_map_queue(q, i); > - > /* >* Set local node, IFF we have more than one hw queue. If >* not, we remain on the home node of the device >*/ > + hctx = blk_mq_map_queue(q, i); > if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE) > hctx->numa_node = local_memory_node(cpu_to_node(i)); > } > @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q) >* >* If the cpu isn't present, the cpu is mapped to first hctx. >*/ > - for_each_present_cpu(i) { > + for_each_possible_cpu(i) { > hctx_idx = q->mq_map[i]; > /* unmapped hw queue can be remapped after CPU topo changed */ > if (!set->tags[hctx_idx] && > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c > index e12d35108225..a37a3b4b6342 100644 > --- a/kernel/irq/affinity.c > +++ b/kernel/irq/affinity.c > @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, > struct cpumask *nmsk, > } > } > > -static cpumask_var_t *alloc_node_to_present_cpumask(void) > +static cpumask_var_t *alloc_node_to_possible_cpumask(void) > { > cpumask_var_t *masks; > int node; > @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void) > return NULL; > } > > -static void free_node_to_present_cpumask(cpumask_var_t *masks) > +static void free_node_to_possible_cpumask(cpumask_var_t *masks) > { > int node; > > @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t > *masks) > kfree(masks); > } > > -static void build_node_to_present_cpumask(cpumask_var_t *masks) > +static void build_node_to_possible_cpumask(cpumask_var_t *masks) > { > int cpu; > > - for_each_present_cpu(cpu) > + for_each_possible_cpu(cpu) > cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]); > } > > -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask, > +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask, > const struct cpumask *mask, nodemask_t *nodemsk) > { > int n, nodes = 0; > > /* Calculate the number of nodes in the supplied affinity mask */ > for_each_node(n) { > - if (cpumask_intersects(mask, node_to_present_cpumask[n])) { > + if (cpumask_intersects(mask, node_to_possible_cpumask[n])) { > node_set(n, *nodemsk); > nodes++; > } > @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct > irq_affinity *affd) > int last_affv = affv + affd->pre_vectors; > nodemask_t nodemsk = NODE_MASK_NONE; > struct cpumask *masks; > - cpumask_var_t nmsk, *node_to_present_cpumask; > + cpumask_var_t nmsk, *node_to_possible_cpumask; > > /* >* If there aren't any vectors left after applying the pre/post > @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct > irq_affinity *affd) > if (!masks) > goto out; > > - node_to_present_cpumask = alloc_node_to_present_cpumask(); > - if (!node_to_present_cpumask) > + node_to_possible_cpumask = alloc_node_to_possible_cpumask(); > + if (!node_to_possible_cpumask) > goto out; > > /* Fill out vectors at the beginning that don't need affinity */ > @@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct > irq_affin
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
[fullquote deleted] > What will happen for the CPU hotplug case? > Wouldn't we route I/O to a disabled CPU with this patch? Why would we route I/O to a disabled CPU (we generally route I/O to devices to start with). How would including possible but not present cpus change anything?
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/23/2017 03:34 PM, Christoph Hellwig wrote: > FYI, the patch below changes both the irq and block mappings to > always use the cpu possible map (should be split in two in due time). > > I think this is the right way forward. For every normal machine > those two are the same, but for VMs with maxcpus above their normal > count or some big iron that can grow more cpus it means we waster > a few more resources for the not present but reserved cpus. It > fixes the reported issue for me: > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > index 9f8cffc8a701..3eb169f15842 100644 > --- a/block/blk-mq-cpumap.c > +++ b/block/blk-mq-cpumap.c > @@ -16,11 +16,6 @@ > > static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > { > - /* > - * Non present CPU will be mapped to queue index 0. > - */ > - if (!cpu_present(cpu)) > - return 0; > return cpu % nr_queues; > } > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 11097477eeab..612ce1fb7c4e 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct > request_queue *q, > INIT_LIST_HEAD(&__ctx->rq_list); > __ctx->queue = q; > > - /* If the cpu isn't present, the cpu is mapped to first hctx */ > - if (!cpu_present(i)) > - continue; > - > - hctx = blk_mq_map_queue(q, i); > - > /* >* Set local node, IFF we have more than one hw queue. If >* not, we remain on the home node of the device >*/ > + hctx = blk_mq_map_queue(q, i); > if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE) > hctx->numa_node = local_memory_node(cpu_to_node(i)); > } > @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q) >* >* If the cpu isn't present, the cpu is mapped to first hctx. >*/ > - for_each_present_cpu(i) { > + for_each_possible_cpu(i) { > hctx_idx = q->mq_map[i]; > /* unmapped hw queue can be remapped after CPU topo changed */ > if (!set->tags[hctx_idx] && > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c > index e12d35108225..a37a3b4b6342 100644 > --- a/kernel/irq/affinity.c > +++ b/kernel/irq/affinity.c > @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, > struct cpumask *nmsk, > } > } > > -static cpumask_var_t *alloc_node_to_present_cpumask(void) > +static cpumask_var_t *alloc_node_to_possible_cpumask(void) > { > cpumask_var_t *masks; > int node; > @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void) > return NULL; > } > > -static void free_node_to_present_cpumask(cpumask_var_t *masks) > +static void free_node_to_possible_cpumask(cpumask_var_t *masks) > { > int node; > > @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t > *masks) > kfree(masks); > } > > -static void build_node_to_present_cpumask(cpumask_var_t *masks) > +static void build_node_to_possible_cpumask(cpumask_var_t *masks) > { > int cpu; > > - for_each_present_cpu(cpu) > + for_each_possible_cpu(cpu) > cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]); > } > > -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask, > +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask, > const struct cpumask *mask, nodemask_t *nodemsk) > { > int n, nodes = 0; > > /* Calculate the number of nodes in the supplied affinity mask */ > for_each_node(n) { > - if (cpumask_intersects(mask, node_to_present_cpumask[n])) { > + if (cpumask_intersects(mask, node_to_possible_cpumask[n])) { > node_set(n, *nodemsk); > nodes++; > } > @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct > irq_affinity *affd) > int last_affv = affv + affd->pre_vectors; > nodemask_t nodemsk = NODE_MASK_NONE; > struct cpumask *masks; > - cpumask_var_t nmsk, *node_to_present_cpumask; > + cpumask_var_t nmsk, *node_to_possible_cpumask; > > /* >* If there aren't any vectors left after applying the pre/post > @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct > irq_affinity *affd) > if (!masks) > goto out; > > - node_to_present_cpumask = alloc_node_to_present_cpumask(); > - if (!node_to_present_cpumask) > + node_to_possible_cpumask = alloc_node_to_possible_cpumask(); > + if (!node_to_possible_cpumask) > goto out; > > /* Fill out vectors at the beginning that don't need affinity */ > @@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct > irq_affinity *affd) > >
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
FYI, the patch below changes both the irq and block mappings to always use the cpu possible map (should be split in two in due time). I think this is the right way forward. For every normal machine those two are the same, but for VMs with maxcpus above their normal count or some big iron that can grow more cpus it means we waster a few more resources for the not present but reserved cpus. It fixes the reported issue for me: diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 9f8cffc8a701..3eb169f15842 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -16,11 +16,6 @@ static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) { - /* -* Non present CPU will be mapped to queue index 0. -*/ - if (!cpu_present(cpu)) - return 0; return cpu % nr_queues; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 11097477eeab..612ce1fb7c4e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q, INIT_LIST_HEAD(&__ctx->rq_list); __ctx->queue = q; - /* If the cpu isn't present, the cpu is mapped to first hctx */ - if (!cpu_present(i)) - continue; - - hctx = blk_mq_map_queue(q, i); - /* * Set local node, IFF we have more than one hw queue. If * not, we remain on the home node of the device */ + hctx = blk_mq_map_queue(q, i); if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE) hctx->numa_node = local_memory_node(cpu_to_node(i)); } @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q) * * If the cpu isn't present, the cpu is mapped to first hctx. */ - for_each_present_cpu(i) { + for_each_possible_cpu(i) { hctx_idx = q->mq_map[i]; /* unmapped hw queue can be remapped after CPU topo changed */ if (!set->tags[hctx_idx] && diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index e12d35108225..a37a3b4b6342 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk, } } -static cpumask_var_t *alloc_node_to_present_cpumask(void) +static cpumask_var_t *alloc_node_to_possible_cpumask(void) { cpumask_var_t *masks; int node; @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void) return NULL; } -static void free_node_to_present_cpumask(cpumask_var_t *masks) +static void free_node_to_possible_cpumask(cpumask_var_t *masks) { int node; @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t *masks) kfree(masks); } -static void build_node_to_present_cpumask(cpumask_var_t *masks) +static void build_node_to_possible_cpumask(cpumask_var_t *masks) { int cpu; - for_each_present_cpu(cpu) + for_each_possible_cpu(cpu) cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]); } -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask, +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask, const struct cpumask *mask, nodemask_t *nodemsk) { int n, nodes = 0; /* Calculate the number of nodes in the supplied affinity mask */ for_each_node(n) { - if (cpumask_intersects(mask, node_to_present_cpumask[n])) { + if (cpumask_intersects(mask, node_to_possible_cpumask[n])) { node_set(n, *nodemsk); nodes++; } @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) int last_affv = affv + affd->pre_vectors; nodemask_t nodemsk = NODE_MASK_NONE; struct cpumask *masks; - cpumask_var_t nmsk, *node_to_present_cpumask; + cpumask_var_t nmsk, *node_to_possible_cpumask; /* * If there aren't any vectors left after applying the pre/post @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) if (!masks) goto out; - node_to_present_cpumask = alloc_node_to_present_cpumask(); - if (!node_to_present_cpumask) + node_to_possible_cpumask = alloc_node_to_possible_cpumask(); + if (!node_to_possible_cpumask) goto out; /* Fill out vectors at the beginning that don't need affinity */ @@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) /* Stabilize the cpumasks */ get_online_cpus(); - build_node_to_present_cpumask(node_to_present_cpumask); - nodes = get_nodes_in_cpumask(node_to_present_cpumask, cpu
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
Ok, it helps to make sure we're actually doing I/O from the CPU, I've reproduced it now.
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
I can't reproduce it in my VM with adding a new CPU. Do you have any interesting blk-mq like actually using multiple queues? I'll give that a spin next.
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/22/2017 12:28 AM, Christoph Hellwig wrote: > Jens, please don't just revert the commit in your for-linus tree. > > On its own this will totally mess up the interrupt assignments. Give > me a bit of time to sort this out properly. I wasn't going to push it until I heard otherwise. I'll just pop it off, for-linus isn't a stable branch. -- Jens Axboe
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
Jens, please don't just revert the commit in your for-linus tree. On its own this will totally mess up the interrupt assignments. Give me a bit of time to sort this out properly.
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 01:31 PM, Christian Borntraeger wrote: > > > On 11/21/2017 09:21 PM, Jens Axboe wrote: >> On 11/21/2017 01:19 PM, Christian Borntraeger wrote: >>> >>> On 11/21/2017 09:14 PM, Jens Axboe wrote: On 11/21/2017 01:12 PM, Christian Borntraeger wrote: > > > On 11/21/2017 08:30 PM, Jens Axboe wrote: >> On 11/21/2017 12:15 PM, Christian Borntraeger wrote: >>> >>> >>> On 11/21/2017 07:39 PM, Jens Axboe wrote: On 11/21/2017 11:27 AM, Jens Axboe wrote: > On 11/21/2017 11:12 AM, Christian Borntraeger wrote: >> >> >> On 11/21/2017 07:09 PM, Jens Axboe wrote: >>> On 11/21/2017 10:27 AM, Jens Axboe wrote: On 11/21/2017 03:14 AM, Christian Borntraeger wrote: > Bisect points to > > 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit > commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 > Author: Christoph Hellwig > Date: Mon Jun 26 12:20:57 2017 +0200 > > blk-mq: Create hctx for each present CPU > > commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. > > Currently we only create hctx for online CPUs, which can lead > to a lot > of churn due to frequent soft offline / online operations. > Instead > allocate one for each present CPU to avoid this and > dramatically simplify > the code. > > Signed-off-by: Christoph Hellwig > Reviewed-by: Jens Axboe > Cc: Keith Busch > Cc: linux-block@vger.kernel.org > Cc: linux-n...@lists.infradead.org > Link: > http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de > Signed-off-by: Thomas Gleixner > Cc: Oleksandr Natalenko > Cc: Mike Galbraith > Signed-off-by: Greg Kroah-Hartman I wonder if we're simply not getting the masks updated correctly. I'll take a look. >>> >>> Can't make it trigger here. We do init for each present CPU, which >>> means >>> that if I offline a few CPUs here and register a queue, those still >>> show >>> up as present (just offline) and get mapped accordingly. >>> >>> From the looks of it, your setup is different. If the CPU doesn't >>> show >>> up as present and it gets hotplugged, then I can see how this >>> condition >>> would trigger. What environment are you running this in? We might >>> have >>> to re-introduce the cpu hotplug notifier, right now we just monitor >>> for a dead cpu and handle that. >> >> I am not doing a hot unplug and the replug, I use KVM and add a >> previously >> not available CPU. >> >> in libvirt/virsh speak: >> 4 > > So that's why we run into problems. It's not present when we load the > device, > but becomes present and online afterwards. > > Christoph, we used to handle this just fine, your patch broke it. > > I'll see if I can come up with an appropriate fix. Can you try the below? >>> >>> >>> It does prevent the crash but it seems that the new CPU is not "used " >>> after the hotplug for mq: >>> >>> >>> output with 2 cpus: >>> /sys/kernel/debug/block/vda >>> /sys/kernel/debug/block/vda/hctx0 >>> /sys/kernel/debug/block/vda/hctx0/cpu0 >>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed >>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged >>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched >>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list >>> /sys/kernel/debug/block/vda/hctx0/active >>> /sys/kernel/debug/block/vda/hctx0/run >>> /sys/kernel/debug/block/vda/hctx0/queued >>> /sys/kernel/debug/block/vda/hctx0/dispatched >>> /sys/kernel/debug/block/vda/hctx0/io_poll >>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap >>> /sys/kernel/debug/block/vda/hctx0/sched_tags >>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap >>> /sys/kernel/debug/block/vda/hctx0/tags >>> /sys/kernel/debug/block/vda/hctx0/ctx_map >>> /sys/kernel/debug/block/vda/hctx0/busy >>> /sys/kernel/debug/block/vda/hctx0/dispatch >>> /sys/kernel/debug/block/vda/hctx0/flags >>> /sys/kernel/debug/block/vda/hctx0/state >>> /sys/kernel/debug/block/vda/sched >>> /sys/kernel/debug/block/vda/sched/dispatch >>> /sys/kernel/debug/block/vda/sched/starved >>> /sys/kernel/debug/block/vda/sched/batching >>> /sys/kernel/debug/block/vda/sched/write_next_rq >>> /sys/kernel
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 09:21 PM, Jens Axboe wrote: > On 11/21/2017 01:19 PM, Christian Borntraeger wrote: >> >> On 11/21/2017 09:14 PM, Jens Axboe wrote: >>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote: On 11/21/2017 08:30 PM, Jens Axboe wrote: > On 11/21/2017 12:15 PM, Christian Borntraeger wrote: >> >> >> On 11/21/2017 07:39 PM, Jens Axboe wrote: >>> On 11/21/2017 11:27 AM, Jens Axboe wrote: On 11/21/2017 11:12 AM, Christian Borntraeger wrote: > > > On 11/21/2017 07:09 PM, Jens Axboe wrote: >> On 11/21/2017 10:27 AM, Jens Axboe wrote: >>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote: Bisect points to 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 Author: Christoph Hellwig Date: Mon Jun 26 12:20:57 2017 +0200 blk-mq: Create hctx for each present CPU commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. Currently we only create hctx for online CPUs, which can lead to a lot of churn due to frequent soft offline / online operations. Instead allocate one for each present CPU to avoid this and dramatically simplify the code. Signed-off-by: Christoph Hellwig Reviewed-by: Jens Axboe Cc: Keith Busch Cc: linux-block@vger.kernel.org Cc: linux-n...@lists.infradead.org Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de Signed-off-by: Thomas Gleixner Cc: Oleksandr Natalenko Cc: Mike Galbraith Signed-off-by: Greg Kroah-Hartman >>> >>> I wonder if we're simply not getting the masks updated correctly. >>> I'll >>> take a look. >> >> Can't make it trigger here. We do init for each present CPU, which >> means >> that if I offline a few CPUs here and register a queue, those still >> show >> up as present (just offline) and get mapped accordingly. >> >> From the looks of it, your setup is different. If the CPU doesn't >> show >> up as present and it gets hotplugged, then I can see how this >> condition >> would trigger. What environment are you running this in? We might >> have >> to re-introduce the cpu hotplug notifier, right now we just monitor >> for a dead cpu and handle that. > > I am not doing a hot unplug and the replug, I use KVM and add a > previously > not available CPU. > > in libvirt/virsh speak: > 4 So that's why we run into problems. It's not present when we load the device, but becomes present and online afterwards. Christoph, we used to handle this just fine, your patch broke it. I'll see if I can come up with an appropriate fix. >>> >>> Can you try the below? >> >> >> It does prevent the crash but it seems that the new CPU is not "used " >> after the hotplug for mq: >> >> >> output with 2 cpus: >> /sys/kernel/debug/block/vda >> /sys/kernel/debug/block/vda/hctx0 >> /sys/kernel/debug/block/vda/hctx0/cpu0 >> /sys/kernel/debug/block/vda/hctx0/cpu0/completed >> /sys/kernel/debug/block/vda/hctx0/cpu0/merged >> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched >> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list >> /sys/kernel/debug/block/vda/hctx0/active >> /sys/kernel/debug/block/vda/hctx0/run >> /sys/kernel/debug/block/vda/hctx0/queued >> /sys/kernel/debug/block/vda/hctx0/dispatched >> /sys/kernel/debug/block/vda/hctx0/io_poll >> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap >> /sys/kernel/debug/block/vda/hctx0/sched_tags >> /sys/kernel/debug/block/vda/hctx0/tags_bitmap >> /sys/kernel/debug/block/vda/hctx0/tags >> /sys/kernel/debug/block/vda/hctx0/ctx_map >> /sys/kernel/debug/block/vda/hctx0/busy >> /sys/kernel/debug/block/vda/hctx0/dispatch >> /sys/kernel/debug/block/vda/hctx0/flags >> /sys/kernel/debug/block/vda/hctx0/state >> /sys/kernel/debug/block/vda/sched >> /sys/kernel/debug/block/vda/sched/dispatch >> /sys/kernel/debug/block/vda/sched/starved >> /sys/kernel/debug/block/vda/sched/batching >> /sys/kernel/debug/block/vda/sched/write_next_rq >> /sys/kernel/debug/block/vda/sched/write_fifo_list >> /sys/kernel/debug/block/vda/sched/read_next_rq >> /sys/kernel/debug/block/vda/sched/read_fifo_list >> /sys/kernel/debug/blo
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 01:19 PM, Christian Borntraeger wrote: > > On 11/21/2017 09:14 PM, Jens Axboe wrote: >> On 11/21/2017 01:12 PM, Christian Borntraeger wrote: >>> >>> >>> On 11/21/2017 08:30 PM, Jens Axboe wrote: On 11/21/2017 12:15 PM, Christian Borntraeger wrote: > > > On 11/21/2017 07:39 PM, Jens Axboe wrote: >> On 11/21/2017 11:27 AM, Jens Axboe wrote: >>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote: On 11/21/2017 07:09 PM, Jens Axboe wrote: > On 11/21/2017 10:27 AM, Jens Axboe wrote: >> On 11/21/2017 03:14 AM, Christian Borntraeger wrote: >>> Bisect points to >>> >>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit >>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 >>> Author: Christoph Hellwig >>> Date: Mon Jun 26 12:20:57 2017 +0200 >>> >>> blk-mq: Create hctx for each present CPU >>> >>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. >>> >>> Currently we only create hctx for online CPUs, which can lead >>> to a lot >>> of churn due to frequent soft offline / online operations. >>> Instead >>> allocate one for each present CPU to avoid this and >>> dramatically simplify >>> the code. >>> >>> Signed-off-by: Christoph Hellwig >>> Reviewed-by: Jens Axboe >>> Cc: Keith Busch >>> Cc: linux-block@vger.kernel.org >>> Cc: linux-n...@lists.infradead.org >>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de >>> Signed-off-by: Thomas Gleixner >>> Cc: Oleksandr Natalenko >>> Cc: Mike Galbraith >>> Signed-off-by: Greg Kroah-Hartman >> >> I wonder if we're simply not getting the masks updated correctly. >> I'll >> take a look. > > Can't make it trigger here. We do init for each present CPU, which > means > that if I offline a few CPUs here and register a queue, those still > show > up as present (just offline) and get mapped accordingly. > > From the looks of it, your setup is different. If the CPU doesn't show > up as present and it gets hotplugged, then I can see how this > condition > would trigger. What environment are you running this in? We might have > to re-introduce the cpu hotplug notifier, right now we just monitor > for a dead cpu and handle that. I am not doing a hot unplug and the replug, I use KVM and add a previously not available CPU. in libvirt/virsh speak: 4 >>> >>> So that's why we run into problems. It's not present when we load the >>> device, >>> but becomes present and online afterwards. >>> >>> Christoph, we used to handle this just fine, your patch broke it. >>> >>> I'll see if I can come up with an appropriate fix. >> >> Can you try the below? > > > It does prevent the crash but it seems that the new CPU is not "used " > after the hotplug for mq: > > > output with 2 cpus: > /sys/kernel/debug/block/vda > /sys/kernel/debug/block/vda/hctx0 > /sys/kernel/debug/block/vda/hctx0/cpu0 > /sys/kernel/debug/block/vda/hctx0/cpu0/completed > /sys/kernel/debug/block/vda/hctx0/cpu0/merged > /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched > /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list > /sys/kernel/debug/block/vda/hctx0/active > /sys/kernel/debug/block/vda/hctx0/run > /sys/kernel/debug/block/vda/hctx0/queued > /sys/kernel/debug/block/vda/hctx0/dispatched > /sys/kernel/debug/block/vda/hctx0/io_poll > /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap > /sys/kernel/debug/block/vda/hctx0/sched_tags > /sys/kernel/debug/block/vda/hctx0/tags_bitmap > /sys/kernel/debug/block/vda/hctx0/tags > /sys/kernel/debug/block/vda/hctx0/ctx_map > /sys/kernel/debug/block/vda/hctx0/busy > /sys/kernel/debug/block/vda/hctx0/dispatch > /sys/kernel/debug/block/vda/hctx0/flags > /sys/kernel/debug/block/vda/hctx0/state > /sys/kernel/debug/block/vda/sched > /sys/kernel/debug/block/vda/sched/dispatch > /sys/kernel/debug/block/vda/sched/starved > /sys/kernel/debug/block/vda/sched/batching > /sys/kernel/debug/block/vda/sched/write_next_rq > /sys/kernel/debug/block/vda/sched/write_fifo_list > /sys/kernel/debug/block/vda/sched/read_next_rq > /sys/kernel/debug/block/vda/sched/read_fifo_list > /sys/kernel/debug/block/vda/write_hints > /sys/kernel/debug/block/vda/state > /sys/kernel/debug/block/vda/requeue_list > /sys/kernel/debug/block/vda/poll_stat Try this, basically just a revert. >>> >
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 09:14 PM, Jens Axboe wrote: > On 11/21/2017 01:12 PM, Christian Borntraeger wrote: >> >> >> On 11/21/2017 08:30 PM, Jens Axboe wrote: >>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote: On 11/21/2017 07:39 PM, Jens Axboe wrote: > On 11/21/2017 11:27 AM, Jens Axboe wrote: >> On 11/21/2017 11:12 AM, Christian Borntraeger wrote: >>> >>> >>> On 11/21/2017 07:09 PM, Jens Axboe wrote: On 11/21/2017 10:27 AM, Jens Axboe wrote: > On 11/21/2017 03:14 AM, Christian Borntraeger wrote: >> Bisect points to >> >> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit >> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 >> Author: Christoph Hellwig >> Date: Mon Jun 26 12:20:57 2017 +0200 >> >> blk-mq: Create hctx for each present CPU >> >> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. >> >> Currently we only create hctx for online CPUs, which can lead to >> a lot >> of churn due to frequent soft offline / online operations. >> Instead >> allocate one for each present CPU to avoid this and dramatically >> simplify >> the code. >> >> Signed-off-by: Christoph Hellwig >> Reviewed-by: Jens Axboe >> Cc: Keith Busch >> Cc: linux-block@vger.kernel.org >> Cc: linux-n...@lists.infradead.org >> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de >> Signed-off-by: Thomas Gleixner >> Cc: Oleksandr Natalenko >> Cc: Mike Galbraith >> Signed-off-by: Greg Kroah-Hartman > > I wonder if we're simply not getting the masks updated correctly. I'll > take a look. Can't make it trigger here. We do init for each present CPU, which means that if I offline a few CPUs here and register a queue, those still show up as present (just offline) and get mapped accordingly. From the looks of it, your setup is different. If the CPU doesn't show up as present and it gets hotplugged, then I can see how this condition would trigger. What environment are you running this in? We might have to re-introduce the cpu hotplug notifier, right now we just monitor for a dead cpu and handle that. >>> >>> I am not doing a hot unplug and the replug, I use KVM and add a >>> previously >>> not available CPU. >>> >>> in libvirt/virsh speak: >>> 4 >> >> So that's why we run into problems. It's not present when we load the >> device, >> but becomes present and online afterwards. >> >> Christoph, we used to handle this just fine, your patch broke it. >> >> I'll see if I can come up with an appropriate fix. > > Can you try the below? It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq: output with 2 cpus: /sys/kernel/debug/block/vda /sys/kernel/debug/block/vda/hctx0 /sys/kernel/debug/block/vda/hctx0/cpu0 /sys/kernel/debug/block/vda/hctx0/cpu0/completed /sys/kernel/debug/block/vda/hctx0/cpu0/merged /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list /sys/kernel/debug/block/vda/hctx0/active /sys/kernel/debug/block/vda/hctx0/run /sys/kernel/debug/block/vda/hctx0/queued /sys/kernel/debug/block/vda/hctx0/dispatched /sys/kernel/debug/block/vda/hctx0/io_poll /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap /sys/kernel/debug/block/vda/hctx0/sched_tags /sys/kernel/debug/block/vda/hctx0/tags_bitmap /sys/kernel/debug/block/vda/hctx0/tags /sys/kernel/debug/block/vda/hctx0/ctx_map /sys/kernel/debug/block/vda/hctx0/busy /sys/kernel/debug/block/vda/hctx0/dispatch /sys/kernel/debug/block/vda/hctx0/flags /sys/kernel/debug/block/vda/hctx0/state /sys/kernel/debug/block/vda/sched /sys/kernel/debug/block/vda/sched/dispatch /sys/kernel/debug/block/vda/sched/starved /sys/kernel/debug/block/vda/sched/batching /sys/kernel/debug/block/vda/sched/write_next_rq /sys/kernel/debug/block/vda/sched/write_fifo_list /sys/kernel/debug/block/vda/sched/read_next_rq /sys/kernel/debug/block/vda/sched/read_fifo_list /sys/kernel/debug/block/vda/write_hints /sys/kernel/debug/block/vda/state /sys/kernel/debug/block/vda/requeue_list /sys/kernel/debug/block/vda/poll_stat >>> >>> Try this, basically just a revert. >> >> Yes, seems to work. >> >> Tested-by: Christian Borntraeger > > Great, thanks for testing. > >> Do you know why the original commit made it into 4.12 stable? After all >> it has no Fixes tag and
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 01:12 PM, Christian Borntraeger wrote: > > > On 11/21/2017 08:30 PM, Jens Axboe wrote: >> On 11/21/2017 12:15 PM, Christian Borntraeger wrote: >>> >>> >>> On 11/21/2017 07:39 PM, Jens Axboe wrote: On 11/21/2017 11:27 AM, Jens Axboe wrote: > On 11/21/2017 11:12 AM, Christian Borntraeger wrote: >> >> >> On 11/21/2017 07:09 PM, Jens Axboe wrote: >>> On 11/21/2017 10:27 AM, Jens Axboe wrote: On 11/21/2017 03:14 AM, Christian Borntraeger wrote: > Bisect points to > > 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit > commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 > Author: Christoph Hellwig > Date: Mon Jun 26 12:20:57 2017 +0200 > > blk-mq: Create hctx for each present CPU > > commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. > > Currently we only create hctx for online CPUs, which can lead to > a lot > of churn due to frequent soft offline / online operations. > Instead > allocate one for each present CPU to avoid this and dramatically > simplify > the code. > > Signed-off-by: Christoph Hellwig > Reviewed-by: Jens Axboe > Cc: Keith Busch > Cc: linux-block@vger.kernel.org > Cc: linux-n...@lists.infradead.org > Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de > Signed-off-by: Thomas Gleixner > Cc: Oleksandr Natalenko > Cc: Mike Galbraith > Signed-off-by: Greg Kroah-Hartman I wonder if we're simply not getting the masks updated correctly. I'll take a look. >>> >>> Can't make it trigger here. We do init for each present CPU, which means >>> that if I offline a few CPUs here and register a queue, those still show >>> up as present (just offline) and get mapped accordingly. >>> >>> From the looks of it, your setup is different. If the CPU doesn't show >>> up as present and it gets hotplugged, then I can see how this condition >>> would trigger. What environment are you running this in? We might have >>> to re-introduce the cpu hotplug notifier, right now we just monitor >>> for a dead cpu and handle that. >> >> I am not doing a hot unplug and the replug, I use KVM and add a >> previously >> not available CPU. >> >> in libvirt/virsh speak: >> 4 > > So that's why we run into problems. It's not present when we load the > device, > but becomes present and online afterwards. > > Christoph, we used to handle this just fine, your patch broke it. > > I'll see if I can come up with an appropriate fix. Can you try the below? >>> >>> >>> It does prevent the crash but it seems that the new CPU is not "used " >>> after the hotplug for mq: >>> >>> >>> output with 2 cpus: >>> /sys/kernel/debug/block/vda >>> /sys/kernel/debug/block/vda/hctx0 >>> /sys/kernel/debug/block/vda/hctx0/cpu0 >>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed >>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged >>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched >>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list >>> /sys/kernel/debug/block/vda/hctx0/active >>> /sys/kernel/debug/block/vda/hctx0/run >>> /sys/kernel/debug/block/vda/hctx0/queued >>> /sys/kernel/debug/block/vda/hctx0/dispatched >>> /sys/kernel/debug/block/vda/hctx0/io_poll >>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap >>> /sys/kernel/debug/block/vda/hctx0/sched_tags >>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap >>> /sys/kernel/debug/block/vda/hctx0/tags >>> /sys/kernel/debug/block/vda/hctx0/ctx_map >>> /sys/kernel/debug/block/vda/hctx0/busy >>> /sys/kernel/debug/block/vda/hctx0/dispatch >>> /sys/kernel/debug/block/vda/hctx0/flags >>> /sys/kernel/debug/block/vda/hctx0/state >>> /sys/kernel/debug/block/vda/sched >>> /sys/kernel/debug/block/vda/sched/dispatch >>> /sys/kernel/debug/block/vda/sched/starved >>> /sys/kernel/debug/block/vda/sched/batching >>> /sys/kernel/debug/block/vda/sched/write_next_rq >>> /sys/kernel/debug/block/vda/sched/write_fifo_list >>> /sys/kernel/debug/block/vda/sched/read_next_rq >>> /sys/kernel/debug/block/vda/sched/read_fifo_list >>> /sys/kernel/debug/block/vda/write_hints >>> /sys/kernel/debug/block/vda/state >>> /sys/kernel/debug/block/vda/requeue_list >>> /sys/kernel/debug/block/vda/poll_stat >> >> Try this, basically just a revert. > > Yes, seems to work. > > Tested-by: Christian Borntraeger Great, thanks for testing. > Do you know why the original commit made it into 4.12 stable? After all > it has no Fixes tag and no cc stable- I was wondering the same thing when you said it was in 4.12.stable and not in 4.12 release. That patch should absolutely not have gone into stable, it's not marked as such a
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 08:30 PM, Jens Axboe wrote: > On 11/21/2017 12:15 PM, Christian Borntraeger wrote: >> >> >> On 11/21/2017 07:39 PM, Jens Axboe wrote: >>> On 11/21/2017 11:27 AM, Jens Axboe wrote: On 11/21/2017 11:12 AM, Christian Borntraeger wrote: > > > On 11/21/2017 07:09 PM, Jens Axboe wrote: >> On 11/21/2017 10:27 AM, Jens Axboe wrote: >>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote: Bisect points to 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 Author: Christoph Hellwig Date: Mon Jun 26 12:20:57 2017 +0200 blk-mq: Create hctx for each present CPU commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. Currently we only create hctx for online CPUs, which can lead to a lot of churn due to frequent soft offline / online operations. Instead allocate one for each present CPU to avoid this and dramatically simplify the code. Signed-off-by: Christoph Hellwig Reviewed-by: Jens Axboe Cc: Keith Busch Cc: linux-block@vger.kernel.org Cc: linux-n...@lists.infradead.org Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de Signed-off-by: Thomas Gleixner Cc: Oleksandr Natalenko Cc: Mike Galbraith Signed-off-by: Greg Kroah-Hartman >>> >>> I wonder if we're simply not getting the masks updated correctly. I'll >>> take a look. >> >> Can't make it trigger here. We do init for each present CPU, which means >> that if I offline a few CPUs here and register a queue, those still show >> up as present (just offline) and get mapped accordingly. >> >> From the looks of it, your setup is different. If the CPU doesn't show >> up as present and it gets hotplugged, then I can see how this condition >> would trigger. What environment are you running this in? We might have >> to re-introduce the cpu hotplug notifier, right now we just monitor >> for a dead cpu and handle that. > > I am not doing a hot unplug and the replug, I use KVM and add a previously > not available CPU. > > in libvirt/virsh speak: > 4 So that's why we run into problems. It's not present when we load the device, but becomes present and online afterwards. Christoph, we used to handle this just fine, your patch broke it. I'll see if I can come up with an appropriate fix. >>> >>> Can you try the below? >> >> >> It does prevent the crash but it seems that the new CPU is not "used " after >> the hotplug for mq: >> >> >> output with 2 cpus: >> /sys/kernel/debug/block/vda >> /sys/kernel/debug/block/vda/hctx0 >> /sys/kernel/debug/block/vda/hctx0/cpu0 >> /sys/kernel/debug/block/vda/hctx0/cpu0/completed >> /sys/kernel/debug/block/vda/hctx0/cpu0/merged >> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched >> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list >> /sys/kernel/debug/block/vda/hctx0/active >> /sys/kernel/debug/block/vda/hctx0/run >> /sys/kernel/debug/block/vda/hctx0/queued >> /sys/kernel/debug/block/vda/hctx0/dispatched >> /sys/kernel/debug/block/vda/hctx0/io_poll >> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap >> /sys/kernel/debug/block/vda/hctx0/sched_tags >> /sys/kernel/debug/block/vda/hctx0/tags_bitmap >> /sys/kernel/debug/block/vda/hctx0/tags >> /sys/kernel/debug/block/vda/hctx0/ctx_map >> /sys/kernel/debug/block/vda/hctx0/busy >> /sys/kernel/debug/block/vda/hctx0/dispatch >> /sys/kernel/debug/block/vda/hctx0/flags >> /sys/kernel/debug/block/vda/hctx0/state >> /sys/kernel/debug/block/vda/sched >> /sys/kernel/debug/block/vda/sched/dispatch >> /sys/kernel/debug/block/vda/sched/starved >> /sys/kernel/debug/block/vda/sched/batching >> /sys/kernel/debug/block/vda/sched/write_next_rq >> /sys/kernel/debug/block/vda/sched/write_fifo_list >> /sys/kernel/debug/block/vda/sched/read_next_rq >> /sys/kernel/debug/block/vda/sched/read_fifo_list >> /sys/kernel/debug/block/vda/write_hints >> /sys/kernel/debug/block/vda/state >> /sys/kernel/debug/block/vda/requeue_list >> /sys/kernel/debug/block/vda/poll_stat > > Try this, basically just a revert. Yes, seems to work. Tested-by: Christian Borntraeger Do you know why the original commit made it into 4.12 stable? After all it has no Fixes tag and no cc stable- > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 11097477eeab..bc1950fa9ef6 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -37,6 +37,9 @@ > #include "blk-wbt.h" > #include "blk-mq-sched.h" > > +static DEFINE_MUTEX(all_q_mutex); > +static LIST_HEAD(all_q_list); > + > static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie); > static void blk_mq_poll_stats_start
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 12:15 PM, Christian Borntraeger wrote: > > > On 11/21/2017 07:39 PM, Jens Axboe wrote: >> On 11/21/2017 11:27 AM, Jens Axboe wrote: >>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote: On 11/21/2017 07:09 PM, Jens Axboe wrote: > On 11/21/2017 10:27 AM, Jens Axboe wrote: >> On 11/21/2017 03:14 AM, Christian Borntraeger wrote: >>> Bisect points to >>> >>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit >>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 >>> Author: Christoph Hellwig >>> Date: Mon Jun 26 12:20:57 2017 +0200 >>> >>> blk-mq: Create hctx for each present CPU >>> >>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. >>> >>> Currently we only create hctx for online CPUs, which can lead to a >>> lot >>> of churn due to frequent soft offline / online operations. Instead >>> allocate one for each present CPU to avoid this and dramatically >>> simplify >>> the code. >>> >>> Signed-off-by: Christoph Hellwig >>> Reviewed-by: Jens Axboe >>> Cc: Keith Busch >>> Cc: linux-block@vger.kernel.org >>> Cc: linux-n...@lists.infradead.org >>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de >>> Signed-off-by: Thomas Gleixner >>> Cc: Oleksandr Natalenko >>> Cc: Mike Galbraith >>> Signed-off-by: Greg Kroah-Hartman >> >> I wonder if we're simply not getting the masks updated correctly. I'll >> take a look. > > Can't make it trigger here. We do init for each present CPU, which means > that if I offline a few CPUs here and register a queue, those still show > up as present (just offline) and get mapped accordingly. > > From the looks of it, your setup is different. If the CPU doesn't show > up as present and it gets hotplugged, then I can see how this condition > would trigger. What environment are you running this in? We might have > to re-introduce the cpu hotplug notifier, right now we just monitor > for a dead cpu and handle that. I am not doing a hot unplug and the replug, I use KVM and add a previously not available CPU. in libvirt/virsh speak: 4 >>> >>> So that's why we run into problems. It's not present when we load the >>> device, >>> but becomes present and online afterwards. >>> >>> Christoph, we used to handle this just fine, your patch broke it. >>> >>> I'll see if I can come up with an appropriate fix. >> >> Can you try the below? > > > It does prevent the crash but it seems that the new CPU is not "used " after > the hotplug for mq: > > > output with 2 cpus: > /sys/kernel/debug/block/vda > /sys/kernel/debug/block/vda/hctx0 > /sys/kernel/debug/block/vda/hctx0/cpu0 > /sys/kernel/debug/block/vda/hctx0/cpu0/completed > /sys/kernel/debug/block/vda/hctx0/cpu0/merged > /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched > /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list > /sys/kernel/debug/block/vda/hctx0/active > /sys/kernel/debug/block/vda/hctx0/run > /sys/kernel/debug/block/vda/hctx0/queued > /sys/kernel/debug/block/vda/hctx0/dispatched > /sys/kernel/debug/block/vda/hctx0/io_poll > /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap > /sys/kernel/debug/block/vda/hctx0/sched_tags > /sys/kernel/debug/block/vda/hctx0/tags_bitmap > /sys/kernel/debug/block/vda/hctx0/tags > /sys/kernel/debug/block/vda/hctx0/ctx_map > /sys/kernel/debug/block/vda/hctx0/busy > /sys/kernel/debug/block/vda/hctx0/dispatch > /sys/kernel/debug/block/vda/hctx0/flags > /sys/kernel/debug/block/vda/hctx0/state > /sys/kernel/debug/block/vda/sched > /sys/kernel/debug/block/vda/sched/dispatch > /sys/kernel/debug/block/vda/sched/starved > /sys/kernel/debug/block/vda/sched/batching > /sys/kernel/debug/block/vda/sched/write_next_rq > /sys/kernel/debug/block/vda/sched/write_fifo_list > /sys/kernel/debug/block/vda/sched/read_next_rq > /sys/kernel/debug/block/vda/sched/read_fifo_list > /sys/kernel/debug/block/vda/write_hints > /sys/kernel/debug/block/vda/state > /sys/kernel/debug/block/vda/requeue_list > /sys/kernel/debug/block/vda/poll_stat Try this, basically just a revert. diff --git a/block/blk-mq.c b/block/blk-mq.c index 11097477eeab..bc1950fa9ef6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -37,6 +37,9 @@ #include "blk-wbt.h" #include "blk-mq-sched.h" +static DEFINE_MUTEX(all_q_mutex); +static LIST_HEAD(all_q_list); + static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie); static void blk_mq_poll_stats_start(struct request_queue *q); static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb); @@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q, INIT_LIST_HEAD(&__ctx->rq_list); __ctx->queue = q; - /* If the cpu isn't present, the cpu is mapped to first hctx */ -
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 07:39 PM, Jens Axboe wrote: > On 11/21/2017 11:27 AM, Jens Axboe wrote: >> On 11/21/2017 11:12 AM, Christian Borntraeger wrote: >>> >>> >>> On 11/21/2017 07:09 PM, Jens Axboe wrote: On 11/21/2017 10:27 AM, Jens Axboe wrote: > On 11/21/2017 03:14 AM, Christian Borntraeger wrote: >> Bisect points to >> >> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit >> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 >> Author: Christoph Hellwig >> Date: Mon Jun 26 12:20:57 2017 +0200 >> >> blk-mq: Create hctx for each present CPU >> >> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. >> >> Currently we only create hctx for online CPUs, which can lead to a >> lot >> of churn due to frequent soft offline / online operations. Instead >> allocate one for each present CPU to avoid this and dramatically >> simplify >> the code. >> >> Signed-off-by: Christoph Hellwig >> Reviewed-by: Jens Axboe >> Cc: Keith Busch >> Cc: linux-block@vger.kernel.org >> Cc: linux-n...@lists.infradead.org >> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de >> Signed-off-by: Thomas Gleixner >> Cc: Oleksandr Natalenko >> Cc: Mike Galbraith >> Signed-off-by: Greg Kroah-Hartman > > I wonder if we're simply not getting the masks updated correctly. I'll > take a look. Can't make it trigger here. We do init for each present CPU, which means that if I offline a few CPUs here and register a queue, those still show up as present (just offline) and get mapped accordingly. From the looks of it, your setup is different. If the CPU doesn't show up as present and it gets hotplugged, then I can see how this condition would trigger. What environment are you running this in? We might have to re-introduce the cpu hotplug notifier, right now we just monitor for a dead cpu and handle that. >>> >>> I am not doing a hot unplug and the replug, I use KVM and add a previously >>> not available CPU. >>> >>> in libvirt/virsh speak: >>> 4 >> >> So that's why we run into problems. It's not present when we load the device, >> but becomes present and online afterwards. >> >> Christoph, we used to handle this just fine, your patch broke it. >> >> I'll see if I can come up with an appropriate fix. > > Can you try the below? It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq: output with 2 cpus: /sys/kernel/debug/block/vda /sys/kernel/debug/block/vda/hctx0 /sys/kernel/debug/block/vda/hctx0/cpu0 /sys/kernel/debug/block/vda/hctx0/cpu0/completed /sys/kernel/debug/block/vda/hctx0/cpu0/merged /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list /sys/kernel/debug/block/vda/hctx0/active /sys/kernel/debug/block/vda/hctx0/run /sys/kernel/debug/block/vda/hctx0/queued /sys/kernel/debug/block/vda/hctx0/dispatched /sys/kernel/debug/block/vda/hctx0/io_poll /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap /sys/kernel/debug/block/vda/hctx0/sched_tags /sys/kernel/debug/block/vda/hctx0/tags_bitmap /sys/kernel/debug/block/vda/hctx0/tags /sys/kernel/debug/block/vda/hctx0/ctx_map /sys/kernel/debug/block/vda/hctx0/busy /sys/kernel/debug/block/vda/hctx0/dispatch /sys/kernel/debug/block/vda/hctx0/flags /sys/kernel/debug/block/vda/hctx0/state /sys/kernel/debug/block/vda/sched /sys/kernel/debug/block/vda/sched/dispatch /sys/kernel/debug/block/vda/sched/starved /sys/kernel/debug/block/vda/sched/batching /sys/kernel/debug/block/vda/sched/write_next_rq /sys/kernel/debug/block/vda/sched/write_fifo_list /sys/kernel/debug/block/vda/sched/read_next_rq /sys/kernel/debug/block/vda/sched/read_fifo_list /sys/kernel/debug/block/vda/write_hints /sys/kernel/debug/block/vda/state /sys/kernel/debug/block/vda/requeue_list /sys/kernel/debug/block/vda/poll_stat > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index b600463791ec..ab3a66e7bd03 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -40,6 +40,7 @@ > static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie); > static void blk_mq_poll_stats_start(struct request_queue *q); > static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb); > +static void blk_mq_map_swqueue(struct request_queue *q); > > static int blk_mq_poll_stats_bkt(const struct request *rq) > { > @@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, > struct blk_mq_tags *tags, > return -ENOMEM; > } > > +static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node > *node) > +{ > + struct blk_mq_hw_ctx *hctx; > + > + hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp); > + blk_mq_map_swqueue(hctx->queue); > + return 0; > +} > + > /* > * 'cpu' is going away. splice any existing rq_list ent
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 11:27 AM, Jens Axboe wrote: > On 11/21/2017 11:12 AM, Christian Borntraeger wrote: >> >> >> On 11/21/2017 07:09 PM, Jens Axboe wrote: >>> On 11/21/2017 10:27 AM, Jens Axboe wrote: On 11/21/2017 03:14 AM, Christian Borntraeger wrote: > Bisect points to > > 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit > commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 > Author: Christoph Hellwig > Date: Mon Jun 26 12:20:57 2017 +0200 > > blk-mq: Create hctx for each present CPU > > commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. > > Currently we only create hctx for online CPUs, which can lead to a lot > of churn due to frequent soft offline / online operations. Instead > allocate one for each present CPU to avoid this and dramatically > simplify > the code. > > Signed-off-by: Christoph Hellwig > Reviewed-by: Jens Axboe > Cc: Keith Busch > Cc: linux-block@vger.kernel.org > Cc: linux-n...@lists.infradead.org > Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de > Signed-off-by: Thomas Gleixner > Cc: Oleksandr Natalenko > Cc: Mike Galbraith > Signed-off-by: Greg Kroah-Hartman I wonder if we're simply not getting the masks updated correctly. I'll take a look. >>> >>> Can't make it trigger here. We do init for each present CPU, which means >>> that if I offline a few CPUs here and register a queue, those still show >>> up as present (just offline) and get mapped accordingly. >>> >>> From the looks of it, your setup is different. If the CPU doesn't show >>> up as present and it gets hotplugged, then I can see how this condition >>> would trigger. What environment are you running this in? We might have >>> to re-introduce the cpu hotplug notifier, right now we just monitor >>> for a dead cpu and handle that. >> >> I am not doing a hot unplug and the replug, I use KVM and add a previously >> not available CPU. >> >> in libvirt/virsh speak: >> 4 > > So that's why we run into problems. It's not present when we load the device, > but becomes present and online afterwards. > > Christoph, we used to handle this just fine, your patch broke it. > > I'll see if I can come up with an appropriate fix. Can you try the below? diff --git a/block/blk-mq.c b/block/blk-mq.c index b600463791ec..ab3a66e7bd03 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -40,6 +40,7 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie); static void blk_mq_poll_stats_start(struct request_queue *q); static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb); +static void blk_mq_map_swqueue(struct request_queue *q); static int blk_mq_poll_stats_bkt(const struct request *rq) { @@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, return -ENOMEM; } +static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node) +{ + struct blk_mq_hw_ctx *hctx; + + hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp); + blk_mq_map_swqueue(hctx->queue); + return 0; +} + /* * 'cpu' is going away. splice any existing rq_list entries from this * software queue to the hw queue dispatch list, and ensure that it @@ -1958,7 +1970,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) struct blk_mq_ctx *ctx; LIST_HEAD(tmp); - hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead); + hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp); ctx = __blk_mq_get_ctx(hctx->queue, cpu); spin_lock(&ctx->lock); @@ -1981,8 +1993,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx) { - cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD, - &hctx->cpuhp_dead); + cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp); } /* hctx->ctxs will be freed in queue's release handler */ @@ -2039,7 +2050,7 @@ static int blk_mq_init_hctx(struct request_queue *q, hctx->queue = q; hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED; - cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead); + cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp); hctx->tags = set->tags[hctx_idx]; @@ -2974,7 +2987,8 @@ static int __init blk_mq_init(void) BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) != (REQ_ATOM_COMPLETE / BITS_PER_BYTE)); - cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL, + cpuhp_setup_state_multi(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare", + blk_mq_hctx_notify_prepare,
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 11:12 AM, Christian Borntraeger wrote: > > > On 11/21/2017 07:09 PM, Jens Axboe wrote: >> On 11/21/2017 10:27 AM, Jens Axboe wrote: >>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote: Bisect points to 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 Author: Christoph Hellwig Date: Mon Jun 26 12:20:57 2017 +0200 blk-mq: Create hctx for each present CPU commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. Currently we only create hctx for online CPUs, which can lead to a lot of churn due to frequent soft offline / online operations. Instead allocate one for each present CPU to avoid this and dramatically simplify the code. Signed-off-by: Christoph Hellwig Reviewed-by: Jens Axboe Cc: Keith Busch Cc: linux-block@vger.kernel.org Cc: linux-n...@lists.infradead.org Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de Signed-off-by: Thomas Gleixner Cc: Oleksandr Natalenko Cc: Mike Galbraith Signed-off-by: Greg Kroah-Hartman >>> >>> I wonder if we're simply not getting the masks updated correctly. I'll >>> take a look. >> >> Can't make it trigger here. We do init for each present CPU, which means >> that if I offline a few CPUs here and register a queue, those still show >> up as present (just offline) and get mapped accordingly. >> >> From the looks of it, your setup is different. If the CPU doesn't show >> up as present and it gets hotplugged, then I can see how this condition >> would trigger. What environment are you running this in? We might have >> to re-introduce the cpu hotplug notifier, right now we just monitor >> for a dead cpu and handle that. > > I am not doing a hot unplug and the replug, I use KVM and add a previously > not available CPU. > > in libvirt/virsh speak: > 4 So that's why we run into problems. It's not present when we load the device, but becomes present and online afterwards. Christoph, we used to handle this just fine, your patch broke it. I'll see if I can come up with an appropriate fix. -- Jens Axboe
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 07:09 PM, Jens Axboe wrote: > On 11/21/2017 10:27 AM, Jens Axboe wrote: >> On 11/21/2017 03:14 AM, Christian Borntraeger wrote: >>> Bisect points to >>> >>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit >>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 >>> Author: Christoph Hellwig >>> Date: Mon Jun 26 12:20:57 2017 +0200 >>> >>> blk-mq: Create hctx for each present CPU >>> >>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. >>> >>> Currently we only create hctx for online CPUs, which can lead to a lot >>> of churn due to frequent soft offline / online operations. Instead >>> allocate one for each present CPU to avoid this and dramatically >>> simplify >>> the code. >>> >>> Signed-off-by: Christoph Hellwig >>> Reviewed-by: Jens Axboe >>> Cc: Keith Busch >>> Cc: linux-block@vger.kernel.org >>> Cc: linux-n...@lists.infradead.org >>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de >>> Signed-off-by: Thomas Gleixner >>> Cc: Oleksandr Natalenko >>> Cc: Mike Galbraith >>> Signed-off-by: Greg Kroah-Hartman >> >> I wonder if we're simply not getting the masks updated correctly. I'll >> take a look. > > Can't make it trigger here. We do init for each present CPU, which means > that if I offline a few CPUs here and register a queue, those still show > up as present (just offline) and get mapped accordingly. > > From the looks of it, your setup is different. If the CPU doesn't show > up as present and it gets hotplugged, then I can see how this condition > would trigger. What environment are you running this in? We might have > to re-introduce the cpu hotplug notifier, right now we just monitor > for a dead cpu and handle that. I am not doing a hot unplug and the replug, I use KVM and add a previously not available CPU. in libvirt/virsh speak: 4
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 10:27 AM, Jens Axboe wrote: > On 11/21/2017 03:14 AM, Christian Borntraeger wrote: >> Bisect points to >> >> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit >> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 >> Author: Christoph Hellwig >> Date: Mon Jun 26 12:20:57 2017 +0200 >> >> blk-mq: Create hctx for each present CPU >> >> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. >> >> Currently we only create hctx for online CPUs, which can lead to a lot >> of churn due to frequent soft offline / online operations. Instead >> allocate one for each present CPU to avoid this and dramatically simplify >> the code. >> >> Signed-off-by: Christoph Hellwig >> Reviewed-by: Jens Axboe >> Cc: Keith Busch >> Cc: linux-block@vger.kernel.org >> Cc: linux-n...@lists.infradead.org >> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de >> Signed-off-by: Thomas Gleixner >> Cc: Oleksandr Natalenko >> Cc: Mike Galbraith >> Signed-off-by: Greg Kroah-Hartman > > I wonder if we're simply not getting the masks updated correctly. I'll > take a look. Can't make it trigger here. We do init for each present CPU, which means that if I offline a few CPUs here and register a queue, those still show up as present (just offline) and get mapped accordingly. >From the looks of it, your setup is different. If the CPU doesn't show up as present and it gets hotplugged, then I can see how this condition would trigger. What environment are you running this in? We might have to re-introduce the cpu hotplug notifier, right now we just monitor for a dead cpu and handle that. -- Jens Axboe
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 03:14 AM, Christian Borntraeger wrote: > Bisect points to > > 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit > commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 > Author: Christoph Hellwig > Date: Mon Jun 26 12:20:57 2017 +0200 > > blk-mq: Create hctx for each present CPU > > commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. > > Currently we only create hctx for online CPUs, which can lead to a lot > of churn due to frequent soft offline / online operations. Instead > allocate one for each present CPU to avoid this and dramatically simplify > the code. > > Signed-off-by: Christoph Hellwig > Reviewed-by: Jens Axboe > Cc: Keith Busch > Cc: linux-block@vger.kernel.org > Cc: linux-n...@lists.infradead.org > Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de > Signed-off-by: Thomas Gleixner > Cc: Oleksandr Natalenko > Cc: Mike Galbraith > Signed-off-by: Greg Kroah-Hartman I wonder if we're simply not getting the masks updated correctly. I'll take a look. -- Jens Axboe
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11/21/2017 10:50 AM, Christian Borntraeger wrote: > > > On 11/21/2017 09:35 AM, Christian Borntraeger wrote: >> >> >> On 11/20/2017 09:52 PM, Jens Axboe wrote: >>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote: On 11/20/2017 08:42 PM, Jens Axboe wrote: > On 11/20/2017 12:29 PM, Christian Borntraeger wrote: >> >> >> On 11/20/2017 08:20 PM, Bart Van Assche wrote: >>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote: This is b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) * are mapped to it. b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) */ 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1143) WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1144) cpu_online(hctx->next_cpu)); 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1145) b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146) /* >>> >>> Did you really try to figure out when the code that reported the warning >>> was introduced? I think that warning was introduced through the >>> following >>> commit: >> >> This was more a cut'n'paste to show which warning triggered since line >> numbers are somewhat volatile. >> >>> >>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9 >>> Date: Wed Apr 16 09:23:48 2014 -0600 >>> >>> blk-mq: don't use preempt_count() to check for right CPU >>> >>> UP or CONFIG_PREEMPT_NONE will return 0, and what we really >>> want to check is whether or not we are on the right CPU. >>> So don't make PREEMPT part of this, just test the CPU in >>> the mask directly. >>> >>> Anyway, I think that warning is appropriate and useful. So the next step >>> is to figure out what work item was involved and why that work item got >>> executed on the wrong CPU. >> >> It seems to be related to virtio-blk (is triggered by fio on such >> disks). Your comment basically >> says: "no this is not a known issue" then :-) >> I will try to take a dump to find out the work item > > blk-mq does not attempt to freeze/sync existing work if a CPU goes away, > and we reconfigure the mappings. So I don't think the above is unexpected, > if you are doing CPU hot unplug while running a fio job. I did a cpu hot plug (adding a CPU) and I started fio AFTER that. >>> >>> OK, that's different, we should not be triggering a warning for that. >>> What does your machine/virtblk topology look like in terms of CPUS, >>> nr of queues for virtblk, etc? >> >> FWIW, 4.11 does work, 4.12 and later is broken. > > In fact: 4.12 is fine, 4.12.14 is broken. Bisect points to 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 Author: Christoph Hellwig Date: Mon Jun 26 12:20:57 2017 +0200 blk-mq: Create hctx for each present CPU commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. Currently we only create hctx for online CPUs, which can lead to a lot of churn due to frequent soft offline / online operations. Instead allocate one for each present CPU to avoid this and dramatically simplify the code. Signed-off-by: Christoph Hellwig Reviewed-by: Jens Axboe Cc: Keith Busch Cc: linux-block@vger.kernel.org Cc: linux-n...@lists.infradead.org Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de Signed-off-by: Thomas Gleixner Cc: Oleksandr Natalenko Cc: Mike Galbraith Signed-off-by: Greg Kroah-Hartman :04 04 a61cb023014a7b7a6b9f24ea04fe8ab22299e706 059ba6dc3290c74e0468937348e580cd53f963e7 M block :04 04 432e719d7e738ffcddfb8fc964544d3b3e0a68f7 f4572aa21b249a851a1b604c148eea109e93b30d M include adding Christoph FWIW, your patch triggers the following on 4.14 when doing a cpu hotplug (adding a CPU) and then accessing a virtio-blk device. 747.652408] [ cut here ] [ 747.652410] WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 __blk_mq_run_hw_queue+0xd4/0x100 [ 747.652410] Modules linked in: dm_multipath [ 747.652412] CPU: 4 PID: 2895 Comm: kworker/4:1H Tainted: GW 4.14.0+ #191 [ 747.652412] Hardware name: IBM 2964 NC9 704 (KVM/Linux) [ 747.652414] Workqueue: kblockd blk_mq_run_work_fn [ 747.652414] task: 6068 task.stack: 5ea3 [ 747.652415] Krnl PSW : 0704f0018000 00505864 (__blk_mq_run_hw_queue+0xd4/0x100) [ 747.652417]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 747.652417] Krnl GPRS: 0010 00ff 5cbec400 00
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
On 11/21/2017 09:35 AM, Christian Borntraeger wrote: > > > On 11/20/2017 09:52 PM, Jens Axboe wrote: >> On 11/20/2017 01:49 PM, Christian Borntraeger wrote: >>> >>> >>> On 11/20/2017 08:42 PM, Jens Axboe wrote: On 11/20/2017 12:29 PM, Christian Borntraeger wrote: > > > On 11/20/2017 08:20 PM, Bart Van Assche wrote: >> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote: >>> This is >>> >>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) >>> * are mapped to it. >>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) >>> */ >>> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1143) >>> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && >>> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1144) >>> cpu_online(hctx->next_cpu)); >>> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1145) >>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146) >>> /* >> >> Did you really try to figure out when the code that reported the warning >> was introduced? I think that warning was introduced through the following >> commit: > > This was more a cut'n'paste to show which warning triggered since line > numbers are somewhat volatile. > >> >> commit fd1270d5df6a005e1248e87042159a799cc4b2c9 >> Date: Wed Apr 16 09:23:48 2014 -0600 >> >> blk-mq: don't use preempt_count() to check for right CPU >> >> UP or CONFIG_PREEMPT_NONE will return 0, and what we really >> want to check is whether or not we are on the right CPU. >> So don't make PREEMPT part of this, just test the CPU in >> the mask directly. >> >> Anyway, I think that warning is appropriate and useful. So the next step >> is to figure out what work item was involved and why that work item got >> executed on the wrong CPU. > > It seems to be related to virtio-blk (is triggered by fio on such disks). > Your comment basically > says: "no this is not a known issue" then :-) > I will try to take a dump to find out the work item blk-mq does not attempt to freeze/sync existing work if a CPU goes away, and we reconfigure the mappings. So I don't think the above is unexpected, if you are doing CPU hot unplug while running a fio job. >>> >>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that. >> >> OK, that's different, we should not be triggering a warning for that. >> What does your machine/virtblk topology look like in terms of CPUS, >> nr of queues for virtblk, etc? > > FWIW, 4.11 does work, 4.12 and later is broken. In fact: 4.12 is fine, 4.12.14 is broken.
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
On 11/20/2017 09:52 PM, Jens Axboe wrote: > On 11/20/2017 01:49 PM, Christian Borntraeger wrote: >> >> >> On 11/20/2017 08:42 PM, Jens Axboe wrote: >>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote: On 11/20/2017 08:20 PM, Bart Van Assche wrote: > On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote: >> This is >> >> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) >> * are mapped to it. >> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) >> */ >> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1143) >> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && >> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1144) >>cpu_online(hctx->next_cpu)); >> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1145) >> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146) >> /* > > Did you really try to figure out when the code that reported the warning > was introduced? I think that warning was introduced through the following > commit: This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile. > > commit fd1270d5df6a005e1248e87042159a799cc4b2c9 > Date: Wed Apr 16 09:23:48 2014 -0600 > > blk-mq: don't use preempt_count() to check for right CPU > > UP or CONFIG_PREEMPT_NONE will return 0, and what we really > want to check is whether or not we are on the right CPU. > So don't make PREEMPT part of this, just test the CPU in > the mask directly. > > Anyway, I think that warning is appropriate and useful. So the next step > is to figure out what work item was involved and why that work item got > executed on the wrong CPU. It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically says: "no this is not a known issue" then :-) I will try to take a dump to find out the work item >>> >>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away, >>> and we reconfigure the mappings. So I don't think the above is unexpected, >>> if you are doing CPU hot unplug while running a fio job. >> >> I did a cpu hot plug (adding a CPU) and I started fio AFTER that. > > OK, that's different, we should not be triggering a warning for that. > What does your machine/virtblk topology look like in terms of CPUS, > nr of queues for virtblk, etc? FWIW, 4.11 does work, 4.12 and later is broken. > > You can probably get this info the easiest by just doing a: > > # find /sys/kernel/debug/block/virtX > > replace virtX with your virtblk device name. Generate this info both > before and after the hotplug event. It happens in all variants (1 cpu to 2 or 16 to 17 and independent of the number of disks). What I can see is that the block layer does not yet sees the new CPU: [root@zhyp137 ~]# find /sys/kernel/debug/block/vd* /sys/kernel/debug/block/vda /sys/kernel/debug/block/vda/hctx0 /sys/kernel/debug/block/vda/hctx0/cpu0 /sys/kernel/debug/block/vda/hctx0/cpu0/completed /sys/kernel/debug/block/vda/hctx0/cpu0/merged /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list /sys/kernel/debug/block/vda/hctx0/active /sys/kernel/debug/block/vda/hctx0/run /sys/kernel/debug/block/vda/hctx0/queued /sys/kernel/debug/block/vda/hctx0/dispatched /sys/kernel/debug/block/vda/hctx0/io_poll /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap /sys/kernel/debug/block/vda/hctx0/sched_tags /sys/kernel/debug/block/vda/hctx0/tags_bitmap /sys/kernel/debug/block/vda/hctx0/tags /sys/kernel/debug/block/vda/hctx0/ctx_map /sys/kernel/debug/block/vda/hctx0/busy /sys/kernel/debug/block/vda/hctx0/dispatch /sys/kernel/debug/block/vda/hctx0/flags /sys/kernel/debug/block/vda/hctx0/state /sys/kernel/debug/block/vda/sched /sys/kernel/debug/block/vda/sched/dispatch /sys/kernel/debug/block/vda/sched/starved /sys/kernel/debug/block/vda/sched/batching /sys/kernel/debug/block/vda/sched/write_next_rq /sys/kernel/debug/block/vda/sched/write_fifo_list /sys/kernel/debug/block/vda/sched/read_next_rq /sys/kernel/debug/block/vda/sched/read_fifo_list /sys/kernel/debug/block/vda/write_hints /sys/kernel/debug/block/vda/state /sys/kernel/debug/block/vda/requeue_list /sys/kernel/debug/block/vda/poll_stat --> in host virsh setvcpu zhyp137 2 [root@zhyp137 ~]# chcpu -e 1 CPU 1 enabled [root@zhyp137 ~]# find /sys/kernel/debug/block/vd* /sys/kernel/debug/block/vda /sys/kernel/debug/block/vda/hctx0 /sys/kernel/debug/block/vda/hctx0/cpu0 /sys/kernel/debug/block/vda/hctx0/cpu0/completed /sys/kernel/debug/block/vda/hctx0/cpu0/merged /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list /sys/kernel/debug/block/vda/hctx0/active /sys/kernel/debug/block/vda/hct
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
On 11/20/2017 01:49 PM, Christian Borntraeger wrote: > > > On 11/20/2017 08:42 PM, Jens Axboe wrote: >> On 11/20/2017 12:29 PM, Christian Borntraeger wrote: >>> >>> >>> On 11/20/2017 08:20 PM, Bart Van Assche wrote: On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote: > This is > > b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) > * are mapped to it. > b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) > */ > 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1143) > WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && > 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1144) > cpu_online(hctx->next_cpu)); > 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1145) > b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/* Did you really try to figure out when the code that reported the warning was introduced? I think that warning was introduced through the following commit: >>> >>> This was more a cut'n'paste to show which warning triggered since line >>> numbers are somewhat volatile. >>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9 Date: Wed Apr 16 09:23:48 2014 -0600 blk-mq: don't use preempt_count() to check for right CPU UP or CONFIG_PREEMPT_NONE will return 0, and what we really want to check is whether or not we are on the right CPU. So don't make PREEMPT part of this, just test the CPU in the mask directly. Anyway, I think that warning is appropriate and useful. So the next step is to figure out what work item was involved and why that work item got executed on the wrong CPU. >>> >>> It seems to be related to virtio-blk (is triggered by fio on such disks). >>> Your comment basically >>> says: "no this is not a known issue" then :-) >>> I will try to take a dump to find out the work item >> >> blk-mq does not attempt to freeze/sync existing work if a CPU goes away, >> and we reconfigure the mappings. So I don't think the above is unexpected, >> if you are doing CPU hot unplug while running a fio job. > > I did a cpu hot plug (adding a CPU) and I started fio AFTER that. OK, that's different, we should not be triggering a warning for that. What does your machine/virtblk topology look like in terms of CPUS, nr of queues for virtblk, etc? You can probably get this info the easiest by just doing a: # find /sys/kernel/debug/block/virtX replace virtX with your virtblk device name. Generate this info both before and after the hotplug event. >> While it's a bit annoying that we trigger the WARN_ON() for a condition >> that can happen, we're basically interested in it if it triggers for >> normal operations. > > I think we should never trigger a WARN_ON on conditions that can > happen. I know some folks enabling panic_on_warn to detect/avoid data > integrity issues. FWIW, this also seems to happen wit 4.13 and 4.12 It's not supposed to happen for your case, so I'd say it's been useful. It's not a critical thing, but it is something that should not trigger and we need to look into why it did, and fixing it up. -- Jens Axboe
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
On 11/20/2017 08:42 PM, Jens Axboe wrote: > On 11/20/2017 12:29 PM, Christian Borntraeger wrote: >> >> >> On 11/20/2017 08:20 PM, Bart Van Assche wrote: >>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote: This is b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) * are mapped to it. b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) */ 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1143) WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1144) cpu_online(hctx->next_cpu)); 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1145) b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/* >>> >>> Did you really try to figure out when the code that reported the warning >>> was introduced? I think that warning was introduced through the following >>> commit: >> >> This was more a cut'n'paste to show which warning triggered since line >> numbers are somewhat volatile. >> >>> >>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9 >>> Date: Wed Apr 16 09:23:48 2014 -0600 >>> >>> blk-mq: don't use preempt_count() to check for right CPU >>> >>> UP or CONFIG_PREEMPT_NONE will return 0, and what we really >>> want to check is whether or not we are on the right CPU. >>> So don't make PREEMPT part of this, just test the CPU in >>> the mask directly. >>> >>> Anyway, I think that warning is appropriate and useful. So the next step >>> is to figure out what work item was involved and why that work item got >>> executed on the wrong CPU. >> >> It seems to be related to virtio-blk (is triggered by fio on such disks). >> Your comment basically >> says: "no this is not a known issue" then :-) >> I will try to take a dump to find out the work item > > blk-mq does not attempt to freeze/sync existing work if a CPU goes away, > and we reconfigure the mappings. So I don't think the above is unexpected, > if you are doing CPU hot unplug while running a fio job. I did a cpu hot plug (adding a CPU) and I started fio AFTER that. > While it's a bit annoying that we trigger the WARN_ON() for a condition > that can happen, we're basically interested in it if it triggers for > normal operations. I think we should never trigger a WARN_ON on conditions that can happen. I know some folks enabling panic_on_warn to detect/avoid data integrity issues. FWIW, this also seems to happen wit 4.13 and 4.12
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
On 11/20/2017 12:29 PM, Christian Borntraeger wrote: > > > On 11/20/2017 08:20 PM, Bart Van Assche wrote: >> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote: >>> This is >>> >>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) * >>> are mapped to it. >>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) */ >>> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1143) >>> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && >>> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1144) >>> cpu_online(hctx->next_cpu)); >>> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1145) >>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/* >> >> Did you really try to figure out when the code that reported the warning >> was introduced? I think that warning was introduced through the following >> commit: > > This was more a cut'n'paste to show which warning triggered since line > numbers are somewhat volatile. > >> >> commit fd1270d5df6a005e1248e87042159a799cc4b2c9 >> Date: Wed Apr 16 09:23:48 2014 -0600 >> >> blk-mq: don't use preempt_count() to check for right CPU >> >> UP or CONFIG_PREEMPT_NONE will return 0, and what we really >> want to check is whether or not we are on the right CPU. >> So don't make PREEMPT part of this, just test the CPU in >> the mask directly. >> >> Anyway, I think that warning is appropriate and useful. So the next step >> is to figure out what work item was involved and why that work item got >> executed on the wrong CPU. > > It seems to be related to virtio-blk (is triggered by fio on such disks). > Your comment basically > says: "no this is not a known issue" then :-) > I will try to take a dump to find out the work item blk-mq does not attempt to freeze/sync existing work if a CPU goes away, and we reconfigure the mappings. So I don't think the above is unexpected, if you are doing CPU hot unplug while running a fio job. While it's a bit annoying that we trigger the WARN_ON() for a condition that can happen, we're basically interested in it if it triggers for normal operations. -- Jens Axboe
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
On 11/20/2017 08:20 PM, Bart Van Assche wrote: > On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote: >> This is >> >> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) * >> are mapped to it. >> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) */ >> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1143) >> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && >> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1144) >>cpu_online(hctx->next_cpu)); >> 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1145) >> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/* > > Did you really try to figure out when the code that reported the warning > was introduced? I think that warning was introduced through the following > commit: This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile. > > commit fd1270d5df6a005e1248e87042159a799cc4b2c9 > Date: Wed Apr 16 09:23:48 2014 -0600 > > blk-mq: don't use preempt_count() to check for right CPU > > UP or CONFIG_PREEMPT_NONE will return 0, and what we really > want to check is whether or not we are on the right CPU. > So don't make PREEMPT part of this, just test the CPU in > the mask directly. > > Anyway, I think that warning is appropriate and useful. So the next step > is to figure out what work item was involved and why that work item got > executed on the wrong CPU. It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically says: "no this is not a known issue" then :-) I will try to take a dump to find out the work item
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote: > This is > > b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) * > are mapped to it. > b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) */ > 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1143) > WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && > 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1144) > cpu_online(hctx->next_cpu)); > 6a83e74d2 (Bart Van Assche 2016-11-02 10:09:51 -0600 1145) > b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/* Did you really try to figure out when the code that reported the warning was introduced? I think that warning was introduced through the following commit: commit fd1270d5df6a005e1248e87042159a799cc4b2c9 Date: Wed Apr 16 09:23:48 2014 -0600 blk-mq: don't use preempt_count() to check for right CPU UP or CONFIG_PREEMPT_NONE will return 0, and what we really want to check is whether or not we are on the right CPU. So don't make PREEMPT part of this, just test the CPU in the mask directly. Anyway, I think that warning is appropriate and useful. So the next step is to figure out what work item was involved and why that work item got executed on the wrong CPU. Bart.