Re: [PATCH] Revert "blk-mq: remove code for dealing with remapping queue"

2018-04-25 Thread Stefan Haberland

On 24.04.2018 22:01, Ming Lei wrote:

This reverts commit 37c7c6c76d431dd7ef9c29d95f6052bd425f004c.

Turns out some drivers(most are FC drivers) may not use managed
IRQ affinity, and has their customized .map_queues meantime, so
still keep this code for avoiding regression.

Reported-by: Laurence Oberman <lober...@redhat.com>
Cc: Ewan Milne <emi...@redhat.com>
Cc: Stefan Haberland <s...@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntrae...@de.ibm.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Sagi Grimberg <s...@grimberg.me>
Signed-off-by: Ming Lei <ming....@redhat.com>
---



Tested-by: Stefan Haberland <s...@linux.vnet.ibm.com>




Re: [PATCH] Revert "blk-mq: remove code for dealing with remapping queue"

2018-04-25 Thread Stefan Haberland

On 25.04.2018 09:41, Christian Borntraeger wrote:

On 04/24/2018 10:01 PM, Ming Lei wrote:

This reverts commit 37c7c6c76d431dd7ef9c29d95f6052bd425f004c.

Turns out some drivers(most are FC drivers) may not use managed
IRQ affinity, and has their customized .map_queues meantime, so
still keep this code for avoiding regression.

Reported-by: Laurence Oberman <lober...@redhat.com>
Cc: Ewan Milne <emi...@redhat.com>
Cc: Stefan Haberland <s...@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntrae...@de.ibm.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Sagi Grimberg <s...@grimberg.me>
Signed-off-by: Ming Lei <ming@redhat.com>

Seems to work ok with 4.17-rc2 + this patch with s390 and dasd devices.
So one of the other fixes seems to have taken care of the original issue.
Would be good if Stefan Haberland could also verify that.




Looks OK to me, too.
Did some testing with 4.17-rc2 + this patch on LPAR and z/VM with 1-65 
CPUs as

well as dynamic CPU definitionon z/VM and a small DASD regression test.




Re: [PATCH 0/8] blk-mq: fix and improve queue mapping

2018-04-09 Thread Stefan Haberland

On 08.04.2018 11:48, Ming Lei wrote:

Hi Jens,

The first two patches fix issues about queue mapping.

The other 6 patches improve queue mapping for blk-mq.

Christian, this patches should fix your issue, so please give
a test, and the patches can be found in the following tree:

https://github.com/ming1/linux/commits/v4.17-rc-blk-fix_mapping_v1

Thanks,
Ming


Hi Ming,

thanks for the patches.
Unfortunately I was not able to try them with your git tree since I ran 
into the same problem, that Li Wang reported today.
But your patches applied to our local git tree seem to work fine. A 
small DASD regression test showed no issues or warnings.


Regards,
Stefan




Re: 4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw_queue

2018-03-27 Thread Stefan Haberland



This warning is harmless, please try the following patch:

--
 From 7b2b5139bfef80f44d1b1424e09ab35b715fbfdb Mon Sep 17 00:00:00 2001
From: Ming Lei <ming@redhat.com>
Date: Tue, 27 Mar 2018 19:54:23 +0800
Subject: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

 From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
with each possisble CPU") on, it should be easier to see unmapped hctx
in some CPU topo, such as, hctx may not be mapped to any CPU.

This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
checking if the hctx is mapped in blk_mq_run_hw_queues().

Reported-by: Stefan Haberland <s...@linux.vnet.ibm.com>
Cc: Christoph Hellwig <h...@lst.de>
Fixes: 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each 
possisble CPU")
Signed-off-by: Ming Lei <ming@redhat.com>
---


Hi Ming,

thanks a lot for the patch. I tried it on top of our 4.16-rc7 branch and 
it works well.
I ran a small regression test set against it and I did not notice any 
warnings or other obvious impacts.


Regards,
Stefan




4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw_queue

2018-03-27 Thread Stefan Haberland

Hi,

I get the following warning in __blk_mq_delay_run_hw_queue when the 
scheduler is set to mq-deadline for DASD devices on s390.


What I see is that for whatever reason there is a hctx nr 0 which has no 
hctx->tags pointer set.
From my observation it is always hctx nr 0 which has a tags NULL 
pointer in it and I see other hctx which have the hctx->tags pointer set 
correctly.



[    2.169986] WARNING: CPU: 0 PID: 0 at block/blk-mq.c:1402 
__blk_mq_delay_run_hw_queue+0xe8/0x118

[    2.170007] Modules linked in:
[    2.170014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.16.0-rc7-04107-g91a05d9e1d6b-dirty #147

[    2.170019] Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
[    2.170024] Krnl PSW : 76fd6c7f c244c24d 
(__blk_mq_delay_run_hw_queue+0xe8/0x118)
[    2.170035]    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 RI:0 EA:3
[    2.170041] Krnl GPRS: 599ec58a 02a94000 
02a94000 0001
[    2.170047]     6e761e98 
02a96980 02a96800
[    2.170052]    02d87ce0 737fbda8 
 0001
[    2.170058]    0001 00aedd10 
737fbc38 737fbc00
[    2.170069] Krnl Code: 006ea3c8: ebaff0a4 lmg 
%r10,%r15,160(%r15)

  006ea3ce: c0f45e0d brcl    15,6d5fe8
 #006ea3d4: a7f40001 brc 15,6ea3d6
 >006ea3d8: e340f0c4    lg  
%r4,192(%r15)
  006ea3de: ebaff0a4 lmg 
%r10,%r15,160(%r15)

  006ea3e4: 07f4 bcr 15,%r4
  006ea3e6: 41b01100 la  %r11,256(%r1)
  006ea3ea: 182a lr  %r2,%r10
[    2.170158] Call Trace:
[    2.170205] ([<02a96800>] 0x2a96800)
[    2.170248]  [<006ea4c0>] blk_mq_run_hw_queue+0xa0/0x100
[    2.170262]  [<006ea59c>] blk_mq_run_hw_queues+0x7c/0x98
[    2.170295]  [<006e88f6>] __blk_mq_complete_request+0x10e/0x1e0
[    2.170300]  [<006e9e30>] blk_mq_complete_request+0x80/0xa0
[    2.170307]  [<0087fad0>] dasd_block_tasklet+0x218/0x480
[    2.170415]  [<0017c3f8>] tasklet_hi_action+0xa0/0x138
[    2.170434]  [<00a91c10>] __do_softirq+0xc8/0x540
[    2.170471]  [<0017bd4e>] irq_exit+0x136/0x140
[    2.170478]  [<0010c912>] do_IRQ+0x8a/0xb8
[    2.170518]  [<00a90ee0>] io_int_handler+0x138/0x2e0
[    2.170524]  [<00102cd0>] enabled_wait+0x58/0x128
[    2.170562] ([<00102cb8>] enabled_wait+0x40/0x128)
[    2.170577]  [<0010319a>] arch_cpu_idle+0x32/0x48
[    2.170604]  [<00a8f636>] default_idle_call+0x3e/0x58
[    2.170613]  [<001cd5d2>] do_idle+0xda/0x190
[    2.170621]  [<001cd93e>] cpu_startup_entry+0x3e/0x48
[    2.170633]  [<00e5ebf4>] start_kernel+0x47c/0x490
[    2.170641]  [<00100020>] _stext+0x20/0x80
[    2.170650] 2 locks held by swapper/0/0:
[    2.170658]  #0:  (&(>lock)->rlock){..-.}, at: 
[] dasd_block_tasklet+0x1cc/0x480
[    2.170676]  #1:  (rcu_read_lock){}, at: [] 
hctx_lock+0x34/0x110

[    2.170750] Last Breaking-Event-Address:
[    2.170758]  [<006ea3d4>] __blk_mq_delay_run_hw_queue+0xe4/0x118
[    2.170803] ---[ end trace 1073cf0de1fd32d0 ]---



Regards,
Stefan



Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

2018-01-16 Thread Stefan Haberland

Ming or Christoph,

would you mind to send this patch to stable #4.12? Or is the fixes tag 
enough to get this fixed in all related releases?


Regards,
Stefan



Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2018-01-11 Thread Stefan Haberland

On 11.01.2018 12:44, Christian Borntraeger wrote:


On 01/11/2018 10:13 AM, Ming Lei wrote:

On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:

On 12/18/2017 02:56 PM, Stefan Haberland wrote:

On 07.12.2017 00:29, Christoph Hellwig wrote:

On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad

  blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
     genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.

Also adding Stefan Haberland if he has an idea why this fails on DASD and 
adding Martin (for the
s390 irq handling code).

That is interesting as it really isn't related to interrupts at all,
it just ensures that possible CPUs are set in ->cpumask.

I guess we'd really want:

e005655c389e3d25bf3e43f71611ec12f3012de0
"blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"

before this commit, but it seems like the whole stack didn't work for
your either.

I wonder if there is some weird thing about nr_cpu_ids in s390?
--
To unsubscribe from this list: send the line "unsubscribe linux-s390" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I tried this on my system and the blk-mq-hotplug-fix branch does not boot for 
me as well.
The disks get up and running and I/O works fine. At least the partition 
detection and EXT4-fs mount works.

But at some point in time the disk do not get any requests.

I currently have no clue why.
I took a dump and had a look at the disk states and they are fine. No error in 
the logs or in our debug entrys. Just empty DASD devices waiting to be called 
for I/O requests.

Do you have anything I could have a look at?

Jens, Christoph, so what do we do about this?
To summarize:
- commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU 
hotplug.
- Jens' quick revert did fix the issue and did not broke DASD support but has 
some issues
with interrupt affinity.
- Christoph patch set fixes the hotplug issue for virtio blk but causes I/O 
hangs on DASDs (even
without hotplug).

Hello,

This one is a valid use case for VM, I think we need to fix that.

Looks there is issue on the fouth patch("blk-mq: only select online
CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
the other 3 patches are same with Christoph's:

https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix

gitweb:

https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix

Could you test it and provide the feedback?

BTW, if it can't help this issue, could you boot from a normal disk first
and dump blk-mq debugfs of DASD later?

That kernel seems to boot fine on my system with DASD disks.

--
To unsubscribe from this list: send the line "unsubscribe linux-s390" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



I did some regression testing and it works quite well. Boot works, 
attaching CPUs during runtime on z/VM and enabling them in Linux works 
as well.

I also did some DASD online/offline CPU enable/disable loops.

Regards,
Stefan



Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2018-01-11 Thread Stefan Haberland

On 11.01.2018 10:13, Ming Lei wrote:

On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:

On 12/18/2017 02:56 PM, Stefan Haberland wrote:

On 07.12.2017 00:29, Christoph Hellwig wrote:

On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad

  blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
     genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.

Also adding Stefan Haberland if he has an idea why this fails on DASD and 
adding Martin (for the
s390 irq handling code).

That is interesting as it really isn't related to interrupts at all,
it just ensures that possible CPUs are set in ->cpumask.

I guess we'd really want:

e005655c389e3d25bf3e43f71611ec12f3012de0
"blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"

before this commit, but it seems like the whole stack didn't work for
your either.

I wonder if there is some weird thing about nr_cpu_ids in s390?
--
To unsubscribe from this list: send the line "unsubscribe linux-s390" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I tried this on my system and the blk-mq-hotplug-fix branch does not boot for 
me as well.
The disks get up and running and I/O works fine. At least the partition 
detection and EXT4-fs mount works.

But at some point in time the disk do not get any requests.

I currently have no clue why.
I took a dump and had a look at the disk states and they are fine. No error in 
the logs or in our debug entrys. Just empty DASD devices waiting to be called 
for I/O requests.

Do you have anything I could have a look at?

Jens, Christoph, so what do we do about this?
To summarize:
- commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU 
hotplug.
- Jens' quick revert did fix the issue and did not broke DASD support but has 
some issues
with interrupt affinity.
- Christoph patch set fixes the hotplug issue for virtio blk but causes I/O 
hangs on DASDs (even
without hotplug).

Hello,

This one is a valid use case for VM, I think we need to fix that.

Looks there is issue on the fouth patch("blk-mq: only select online
CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
the other 3 patches are same with Christoph's:

https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix

gitweb:

https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix

Could you test it and provide the feedback?

BTW, if it can't help this issue, could you boot from a normal disk first
and dump blk-mq debugfs of DASD later?

Thanks,
Ming



Hi,

thanks for the patch. I had pretty much the same place in suspicion.
I will test it asap.

Regards,
Stefan



Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-12-18 Thread Stefan Haberland

On 07.12.2017 00:29, Christoph Hellwig wrote:

On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
t > commit 11b2025c3326f7096ceb588c3117c7883850c068-> bad

 blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc-> good
genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.

Also adding Stefan Haberland if he has an idea why this fails on DASD and 
adding Martin (for the
s390 irq handling code).

That is interesting as it really isn't related to interrupts at all,
it just ensures that possible CPUs are set in ->cpumask.

I guess we'd really want:

e005655c389e3d25bf3e43f71611ec12f3012de0
"blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"

before this commit, but it seems like the whole stack didn't work for
your either.

I wonder if there is some weird thing about nr_cpu_ids in s390?
--
To unsubscribe from this list: send the line "unsubscribe linux-s390" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



I tried this on my system and the blk-mq-hotplug-fix branch does not 
boot for me as well.
The disks get up and running and I/O works fine. At least the partition 
detection and EXT4-fs mount works.


But at some point in time the disk do not get any requests.

I currently have no clue why.
I took a dump and had a look at the disk states and they are fine. No 
error in the logs or in our debug entrys. Just empty DASD devices 
waiting to be called for I/O requests.


Do you have anything I could have a look at?



[v2] block: check partition alignment

2016-12-19 Thread Stefan Haberland
Partitions that are not aligned to the blocksize of a device may cause
invalid I/O requests because the blocklayer cares only about alignment
within the partition when building requests on partitions.

device
|4096|4096|4096|
partition offset 512byte
|-512-|4096|4096|4096|

When reading/writing one 4k block of the partition this maps to
reading/writing with an offset of 512 byte of the device leading to
unaligned requests for the device which in turn may cause unexpected
behavior of the device driver.

For DASD devices we have to translate the block number into a cylinder,
head, record format. The unaligned requests lead to wrong calculation
and therefore to misdirected I/O. In a "good" case this leads to I/O
errors because the underlying hardware detects the wrong addressing.
In a worst case scenario this might destroy data on the device.

To prevent partitions that are not aligned to the physical blocksize
of a device check for the alignment in the blkpg_ioctl.

Signed-off-by: Stefan Haberland <s...@linux.vnet.ibm.com>
---
 block/ioctl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 755119c..36b5c21 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -45,6 +45,9 @@ static int blkpg_ioctl(struct block_device *bdev, struct 
blkpg_ioctl_arg __user
|| pstart < 0 || plength < 0 || partno > 
65535)
return -EINVAL;
}
+   /* check if partition is aligned to blocksize */
+   if (p.start & (bdev_logical_block_size(bdev) - 1))
+   return -EINVAL;
 
mutex_lock(>bd_mutex);
 
-- 
2.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] block: check partition alignment

2016-12-19 Thread Stefan Haberland
Am 14.12.2016 um 18:07 schrieb Christoph Hellwig:
>> To prevent partitions that are not aligned to the physical blocksize
>> > of a device check for the alignment in the blkpg_ioctl.
> We'd also need to reject this when reading partitions from disk, right?
>

I am not sure if there is a problem. I was not able to recreate the error
with a partition read from disk. But simply because I was not able to
write a defective partition table to disk. All variants I tried where OK.
So I am at least not aware of a way to break it via the partition
detection code.
I just noticed that the ioctl which can be called by anyone is
able to establish defective partitions.


Am 15.12.2016 um 09:56 schrieb Damien Le Moal:
> On 12/15/16 17:45, Christoph Hellwig wrote:
>> On Thu, Dec 15, 2016 at 10:33:47AM +0900, Damien Le Moal wrote:
>>> For a regular block device, I agree. But in Stephan case, I think that
>>> the check really needs to be against the physical block size, with the
>>> added condition that the bdev is a DASD device (similarly to the zone
>>> alignment check for zoned block devices).
>>
>> Then they need to expose a chunk_size.  physical block size is defined
>> as not having a meaning for the kernel.
>
> Or create the block device with the logical block size set to the
> physical sector size. Which would be even more simple and in fact
> correct in this case since only physically aligned accesses make sense
> for DASD.
>

We already do it this way. So the logical blocksize is fine for DASD.
I just was not aware of the fact that the physical blocksize is a 
best_can_do field. So I changed it this way and also incorporated your 
other feedback regarding the modulo. Here is the second suggestion for
the patch.

Stefan Haberland (1):
  block: check partition alignment

 block/ioctl.c | 3 +++
 1 file changed, 3 insertions(+)

-- 
2.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] block: check partition alignment

2016-12-14 Thread Stefan Haberland
Partitions that are not aligned to the blocksize of a device may cause
invalid I/O requests because the blocklayer cares only about alignment
within the partition when building requests on partitions.

device
|4096|4096|4096|
partition offset 512byte
|-512-|4096|4096|4096|

When reading/writing one 4k block of the partition this maps to
reading/writing with an offset of 512 byte of the device leading to
unaligned requests for the device which in turn may cause unexpected
behavior of the device driver.

For DASD devices we have to translate the block number into a cylinder,
head, record format. The unaligned requests lead to wrong calculation
and therefore to misdirected I/O. In a "good" case this leads to I/O
errors because the underlying hardware detects the wrong addressing.
In a worst case scenario this might destroy data on the device.

To prevent partitions that are not aligned to the physical blocksize
of a device check for the alignment in the blkpg_ioctl.

Signed-off-by: Stefan Haberland <s...@linux.vnet.ibm.com>
---
 block/ioctl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 755119c..8b83afee 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -45,6 +45,9 @@ static int blkpg_ioctl(struct block_device *bdev, struct 
blkpg_ioctl_arg __user
|| pstart < 0 || plength < 0 || partno > 
65535)
return -EINVAL;
}
+   /* check if partition is aligned to blocksize */
+   if (p.start % bdev_physical_block_size(bdev) != 0)
+   return -EINVAL;
 
mutex_lock(>bd_mutex);
 
-- 
2.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html