Re: [PATCH] Revert "blk-mq: remove code for dealing with remapping queue"
On 24.04.2018 22:01, Ming Lei wrote: This reverts commit 37c7c6c76d431dd7ef9c29d95f6052bd425f004c. Turns out some drivers(most are FC drivers) may not use managed IRQ affinity, and has their customized .map_queues meantime, so still keep this code for avoiding regression. Reported-by: Laurence Oberman <lober...@redhat.com> Cc: Ewan Milne <emi...@redhat.com> Cc: Stefan Haberland <s...@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntrae...@de.ibm.com> Cc: Christoph Hellwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Signed-off-by: Ming Lei <ming....@redhat.com> --- Tested-by: Stefan Haberland <s...@linux.vnet.ibm.com>
Re: [PATCH] Revert "blk-mq: remove code for dealing with remapping queue"
On 25.04.2018 09:41, Christian Borntraeger wrote: On 04/24/2018 10:01 PM, Ming Lei wrote: This reverts commit 37c7c6c76d431dd7ef9c29d95f6052bd425f004c. Turns out some drivers(most are FC drivers) may not use managed IRQ affinity, and has their customized .map_queues meantime, so still keep this code for avoiding regression. Reported-by: Laurence Oberman <lober...@redhat.com> Cc: Ewan Milne <emi...@redhat.com> Cc: Stefan Haberland <s...@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntrae...@de.ibm.com> Cc: Christoph Hellwig <h...@lst.de> Cc: Sagi Grimberg <s...@grimberg.me> Signed-off-by: Ming Lei <ming@redhat.com> Seems to work ok with 4.17-rc2 + this patch with s390 and dasd devices. So one of the other fixes seems to have taken care of the original issue. Would be good if Stefan Haberland could also verify that. Looks OK to me, too. Did some testing with 4.17-rc2 + this patch on LPAR and z/VM with 1-65 CPUs as well as dynamic CPU definitionon z/VM and a small DASD regression test.
Re: [PATCH 0/8] blk-mq: fix and improve queue mapping
On 08.04.2018 11:48, Ming Lei wrote: Hi Jens, The first two patches fix issues about queue mapping. The other 6 patches improve queue mapping for blk-mq. Christian, this patches should fix your issue, so please give a test, and the patches can be found in the following tree: https://github.com/ming1/linux/commits/v4.17-rc-blk-fix_mapping_v1 Thanks, Ming Hi Ming, thanks for the patches. Unfortunately I was not able to try them with your git tree since I ran into the same problem, that Li Wang reported today. But your patches applied to our local git tree seem to work fine. A small DASD regression test showed no issues or warnings. Regards, Stefan
Re: 4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw_queue
This warning is harmless, please try the following patch: -- From 7b2b5139bfef80f44d1b1424e09ab35b715fbfdb Mon Sep 17 00:00:00 2001 From: Ming Lei <ming@redhat.com> Date: Tue, 27 Mar 2018 19:54:23 +0800 Subject: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each possisble CPU") on, it should be easier to see unmapped hctx in some CPU topo, such as, hctx may not be mapped to any CPU. This patch avoids the warning in __blk_mq_delay_run_hw_queue() by checking if the hctx is mapped in blk_mq_run_hw_queues(). Reported-by: Stefan Haberland <s...@linux.vnet.ibm.com> Cc: Christoph Hellwig <h...@lst.de> Fixes: 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each possisble CPU") Signed-off-by: Ming Lei <ming@redhat.com> --- Hi Ming, thanks a lot for the patch. I tried it on top of our 4.16-rc7 branch and it works well. I ran a small regression test set against it and I did not notice any warnings or other obvious impacts. Regards, Stefan
4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw_queue
Hi, I get the following warning in __blk_mq_delay_run_hw_queue when the scheduler is set to mq-deadline for DASD devices on s390. What I see is that for whatever reason there is a hctx nr 0 which has no hctx->tags pointer set. From my observation it is always hctx nr 0 which has a tags NULL pointer in it and I see other hctx which have the hctx->tags pointer set correctly. [ 2.169986] WARNING: CPU: 0 PID: 0 at block/blk-mq.c:1402 __blk_mq_delay_run_hw_queue+0xe8/0x118 [ 2.170007] Modules linked in: [ 2.170014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0-rc7-04107-g91a05d9e1d6b-dirty #147 [ 2.170019] Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) [ 2.170024] Krnl PSW : 76fd6c7f c244c24d (__blk_mq_delay_run_hw_queue+0xe8/0x118) [ 2.170035] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 2.170041] Krnl GPRS: 599ec58a 02a94000 02a94000 0001 [ 2.170047] 6e761e98 02a96980 02a96800 [ 2.170052] 02d87ce0 737fbda8 0001 [ 2.170058] 0001 00aedd10 737fbc38 737fbc00 [ 2.170069] Krnl Code: 006ea3c8: ebaff0a4 lmg %r10,%r15,160(%r15) 006ea3ce: c0f45e0d brcl 15,6d5fe8 #006ea3d4: a7f40001 brc 15,6ea3d6 >006ea3d8: e340f0c4 lg %r4,192(%r15) 006ea3de: ebaff0a4 lmg %r10,%r15,160(%r15) 006ea3e4: 07f4 bcr 15,%r4 006ea3e6: 41b01100 la %r11,256(%r1) 006ea3ea: 182a lr %r2,%r10 [ 2.170158] Call Trace: [ 2.170205] ([<02a96800>] 0x2a96800) [ 2.170248] [<006ea4c0>] blk_mq_run_hw_queue+0xa0/0x100 [ 2.170262] [<006ea59c>] blk_mq_run_hw_queues+0x7c/0x98 [ 2.170295] [<006e88f6>] __blk_mq_complete_request+0x10e/0x1e0 [ 2.170300] [<006e9e30>] blk_mq_complete_request+0x80/0xa0 [ 2.170307] [<0087fad0>] dasd_block_tasklet+0x218/0x480 [ 2.170415] [<0017c3f8>] tasklet_hi_action+0xa0/0x138 [ 2.170434] [<00a91c10>] __do_softirq+0xc8/0x540 [ 2.170471] [<0017bd4e>] irq_exit+0x136/0x140 [ 2.170478] [<0010c912>] do_IRQ+0x8a/0xb8 [ 2.170518] [<00a90ee0>] io_int_handler+0x138/0x2e0 [ 2.170524] [<00102cd0>] enabled_wait+0x58/0x128 [ 2.170562] ([<00102cb8>] enabled_wait+0x40/0x128) [ 2.170577] [<0010319a>] arch_cpu_idle+0x32/0x48 [ 2.170604] [<00a8f636>] default_idle_call+0x3e/0x58 [ 2.170613] [<001cd5d2>] do_idle+0xda/0x190 [ 2.170621] [<001cd93e>] cpu_startup_entry+0x3e/0x48 [ 2.170633] [<00e5ebf4>] start_kernel+0x47c/0x490 [ 2.170641] [<00100020>] _stext+0x20/0x80 [ 2.170650] 2 locks held by swapper/0/0: [ 2.170658] #0: (&(>lock)->rlock){..-.}, at: [] dasd_block_tasklet+0x1cc/0x480 [ 2.170676] #1: (rcu_read_lock){}, at: [ ] hctx_lock+0x34/0x110 [ 2.170750] Last Breaking-Event-Address: [ 2.170758] [<006ea3d4>] __blk_mq_delay_run_hw_queue+0xe4/0x118 [ 2.170803] ---[ end trace 1073cf0de1fd32d0 ]--- Regards, Stefan
Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU
Ming or Christoph, would you mind to send this patch to stable #4.12? Or is the fixes tag enough to get this fixed in all related releases? Regards, Stefan
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11.01.2018 12:44, Christian Borntraeger wrote: On 01/11/2018 10:13 AM, Ming Lei wrote: On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote: On 12/18/2017 02:56 PM, Stefan Haberland wrote: On 07.12.2017 00:29, Christoph Hellwig wrote: On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad blk-mq: create a blk_mq_ctx for each possible CPU does not boot on DASD and commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good genirq/affinity: assign vectors to all possible CPUs does boot with DASD disks. Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the s390 irq handling code). That is interesting as it really isn't related to interrupts at all, it just ensures that possible CPUs are set in ->cpumask. I guess we'd really want: e005655c389e3d25bf3e43f71611ec12f3012de0 "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" before this commit, but it seems like the whole stack didn't work for your either. I wonder if there is some weird thing about nr_cpu_ids in s390? -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well. The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works. But at some point in time the disk do not get any requests. I currently have no clue why. I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests. Do you have anything I could have a look at? Jens, Christoph, so what do we do about this? To summarize: - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug. - Jens' quick revert did fix the issue and did not broke DASD support but has some issues with interrupt affinity. - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even without hotplug). Hello, This one is a valid use case for VM, I think we need to fix that. Looks there is issue on the fouth patch("blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and the other 3 patches are same with Christoph's: https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix gitweb: https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix Could you test it and provide the feedback? BTW, if it can't help this issue, could you boot from a normal disk first and dump blk-mq debugfs of DASD later? That kernel seems to boot fine on my system with DASD disks. -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I did some regression testing and it works quite well. Boot works, attaching CPUs during runtime on z/VM and enabling them in Linux works as well. I also did some DASD online/offline CPU enable/disable loops. Regards, Stefan
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 11.01.2018 10:13, Ming Lei wrote: On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote: On 12/18/2017 02:56 PM, Stefan Haberland wrote: On 07.12.2017 00:29, Christoph Hellwig wrote: On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad blk-mq: create a blk_mq_ctx for each possible CPU does not boot on DASD and commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good genirq/affinity: assign vectors to all possible CPUs does boot with DASD disks. Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the s390 irq handling code). That is interesting as it really isn't related to interrupts at all, it just ensures that possible CPUs are set in ->cpumask. I guess we'd really want: e005655c389e3d25bf3e43f71611ec12f3012de0 "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" before this commit, but it seems like the whole stack didn't work for your either. I wonder if there is some weird thing about nr_cpu_ids in s390? -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well. The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works. But at some point in time the disk do not get any requests. I currently have no clue why. I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests. Do you have anything I could have a look at? Jens, Christoph, so what do we do about this? To summarize: - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug. - Jens' quick revert did fix the issue and did not broke DASD support but has some issues with interrupt affinity. - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even without hotplug). Hello, This one is a valid use case for VM, I think we need to fix that. Looks there is issue on the fouth patch("blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and the other 3 patches are same with Christoph's: https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix gitweb: https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix Could you test it and provide the feedback? BTW, if it can't help this issue, could you boot from a normal disk first and dump blk-mq debugfs of DASD later? Thanks, Ming Hi, thanks for the patch. I had pretty much the same place in suspicion. I will test it asap. Regards, Stefan
Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
On 07.12.2017 00:29, Christoph Hellwig wrote: On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: t > commit 11b2025c3326f7096ceb588c3117c7883850c068-> bad blk-mq: create a blk_mq_ctx for each possible CPU does not boot on DASD and commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc-> good genirq/affinity: assign vectors to all possible CPUs does boot with DASD disks. Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the s390 irq handling code). That is interesting as it really isn't related to interrupts at all, it just ensures that possible CPUs are set in ->cpumask. I guess we'd really want: e005655c389e3d25bf3e43f71611ec12f3012de0 "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" before this commit, but it seems like the whole stack didn't work for your either. I wonder if there is some weird thing about nr_cpu_ids in s390? -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well. The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works. But at some point in time the disk do not get any requests. I currently have no clue why. I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests. Do you have anything I could have a look at?
[v2] block: check partition alignment
Partitions that are not aligned to the blocksize of a device may cause invalid I/O requests because the blocklayer cares only about alignment within the partition when building requests on partitions. device |4096|4096|4096| partition offset 512byte |-512-|4096|4096|4096| When reading/writing one 4k block of the partition this maps to reading/writing with an offset of 512 byte of the device leading to unaligned requests for the device which in turn may cause unexpected behavior of the device driver. For DASD devices we have to translate the block number into a cylinder, head, record format. The unaligned requests lead to wrong calculation and therefore to misdirected I/O. In a "good" case this leads to I/O errors because the underlying hardware detects the wrong addressing. In a worst case scenario this might destroy data on the device. To prevent partitions that are not aligned to the physical blocksize of a device check for the alignment in the blkpg_ioctl. Signed-off-by: Stefan Haberland <s...@linux.vnet.ibm.com> --- block/ioctl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/ioctl.c b/block/ioctl.c index 755119c..36b5c21 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -45,6 +45,9 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user || pstart < 0 || plength < 0 || partno > 65535) return -EINVAL; } + /* check if partition is aligned to blocksize */ + if (p.start & (bdev_logical_block_size(bdev) - 1)) + return -EINVAL; mutex_lock(>bd_mutex); -- 2.8.4 -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] block: check partition alignment
Am 14.12.2016 um 18:07 schrieb Christoph Hellwig: >> To prevent partitions that are not aligned to the physical blocksize >> > of a device check for the alignment in the blkpg_ioctl. > We'd also need to reject this when reading partitions from disk, right? > I am not sure if there is a problem. I was not able to recreate the error with a partition read from disk. But simply because I was not able to write a defective partition table to disk. All variants I tried where OK. So I am at least not aware of a way to break it via the partition detection code. I just noticed that the ioctl which can be called by anyone is able to establish defective partitions. Am 15.12.2016 um 09:56 schrieb Damien Le Moal: > On 12/15/16 17:45, Christoph Hellwig wrote: >> On Thu, Dec 15, 2016 at 10:33:47AM +0900, Damien Le Moal wrote: >>> For a regular block device, I agree. But in Stephan case, I think that >>> the check really needs to be against the physical block size, with the >>> added condition that the bdev is a DASD device (similarly to the zone >>> alignment check for zoned block devices). >> >> Then they need to expose a chunk_size. physical block size is defined >> as not having a meaning for the kernel. > > Or create the block device with the logical block size set to the > physical sector size. Which would be even more simple and in fact > correct in this case since only physically aligned accesses make sense > for DASD. > We already do it this way. So the logical blocksize is fine for DASD. I just was not aware of the fact that the physical blocksize is a best_can_do field. So I changed it this way and also incorporated your other feedback regarding the modulo. Here is the second suggestion for the patch. Stefan Haberland (1): block: check partition alignment block/ioctl.c | 3 +++ 1 file changed, 3 insertions(+) -- 2.8.4 -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] block: check partition alignment
Partitions that are not aligned to the blocksize of a device may cause invalid I/O requests because the blocklayer cares only about alignment within the partition when building requests on partitions. device |4096|4096|4096| partition offset 512byte |-512-|4096|4096|4096| When reading/writing one 4k block of the partition this maps to reading/writing with an offset of 512 byte of the device leading to unaligned requests for the device which in turn may cause unexpected behavior of the device driver. For DASD devices we have to translate the block number into a cylinder, head, record format. The unaligned requests lead to wrong calculation and therefore to misdirected I/O. In a "good" case this leads to I/O errors because the underlying hardware detects the wrong addressing. In a worst case scenario this might destroy data on the device. To prevent partitions that are not aligned to the physical blocksize of a device check for the alignment in the blkpg_ioctl. Signed-off-by: Stefan Haberland <s...@linux.vnet.ibm.com> --- block/ioctl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/ioctl.c b/block/ioctl.c index 755119c..8b83afee 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -45,6 +45,9 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user || pstart < 0 || plength < 0 || partno > 65535) return -EINVAL; } + /* check if partition is aligned to blocksize */ + if (p.start % bdev_physical_block_size(bdev) != 0) + return -EINVAL; mutex_lock(>bd_mutex); -- 2.8.4 -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html