Re: blk-mq 5-8 times slower for bmap-tools

2018-08-22 Thread Ming Lei
On Thu, Aug 23, 2018 at 01:51:05AM +0100, Ben Hutchings wrote:
> On Thu, 2018-08-23 at 06:02 +0800, Ming Lei wrote:
> > On Wed, Aug 22, 2018 at 08:22:00PM +0100, Ben Hutchings wrote:
> > > On Mon, 2018-08-20 at 11:04 +0200, Ricardo Ribalda Delgado wrote:
> > > > Hello Ming
> > > > On Mon, Aug 20, 2018 at 10:30 AM Ming Lei  wrote:
> > > 
> > > [...]
> > > > > One problem found from your iostat log is that looks there is ~30sec
> > > > > idle period between IO activities when blk-mq is enabled.
> > > > 
> > > > During all the test the LED on the device was blinking.
> > > > 
> > > > But a closer look to dmesg reveals a lot of this messages:
> > > > 
> > > > [  196.929811] sd 6:0:0:0: [sdb] tag#3 data cmplt err -71 uas-tag 1
> > > > inflight: CMD
> > > > [  196.929822] sd 6:0:0:0: [sdb] tag#3 CDB: Write(10) 2a 00 00 04 00
> > > > 68 00 04 00 00
> > > > [  227.764379] sd 6:0:0:0: [sdb] tag#4 uas_eh_abort_handler 0 uas-tag
> > > > 2 inflight: CMD OUT
> > > > [  227.764389] sd 6:0:0:0: [sdb] tag#4 CDB: Write(10) 2a 00 00 04 04
> > > > 68 00 04 00 00
> > > > [  227.766555] sd 6:0:0:0: [sdb] tag#3 uas_eh_abort_handler 0 uas-tag
> > > > 1 inflight: CMD
> > > > [  227.766562] sd 6:0:0:0: [sdb] tag#3 CDB: Write(10) 2a 00 00 04 00
> > > > 68 00 04 00 00
> > > > [  227.784312] scsi host6: uas_eh_device_reset_handler start
> > > > [  227.913672] usb 2-2: reset SuperSpeed USB device number 3 using 
> > > > xhci_hcd
> > > > [  227.944842] scsi host6: uas_eh_device_reset_handler success
> > > > [  231.416133] sd 6:0:0:0: [sdb] tag#1 data cmplt err -71 uas-tag 10
> > > > inflight: CMD
> > > > [  231.416147] sd 6:0:0:0: [sdb] tag#1 CDB: Write(10) 2a 00 00 06 d5
> > > > e8 00 04 00 00
> > > > 
> > > > And they take around 30 secons (227-196)
> > > > 
> > > > 
> > > > Maybe this is a hw issue? I will bring the reader home tonight and see
> > > > if I can replicate the bug with my notebook
> > > > 
> > > > > 
> > > > > Maybe it is related with timeout, given we had big change in v4.17 
> > > > > timeout code,
> > > > > and we also fixed one scsi_mq timeout related issue recently, and the 
> > > > > patch[1] has
> > > > > been merged to v4.18 release already.
> > > > 
> > > > I tried with v4.18-rc4 (latest one packaged in debian experimental)
> > > > and after 3 runs, 2 were fine (27 sec), but the last one was over a
> > > > minute.
> > > 
> > > Is it possible that this is fixed by "block: really disable runtime-pm
> > > for blk-mq"?
> > > 
> > 
> > As I mentioned, the similar issue can be triggered in both blk-mq and
> > non-blk-mq, so it shouldn't be related with runtime PM.
> > 
> > We need our UAS guys to take a look at this issue.
> 
> I saw that you found a problem with UAS in both modes, but Ricardo's
> problem seemed to be specific to blk-mq.
> 

Actually both are triggered when doing WRITE to UAS, and the error
message is same too.

So I think two problems may be same.

Thanks,
Ming


Re: [PATCH blktests 0/3] Add NVMeOF multipath tests

2018-08-22 Thread Bart Van Assche
On Tue, 2018-08-21 at 08:46 +0200, Johannes Thumshirn wrote:
> On Mon, Aug 20, 2018 at 03:46:45PM +, Bart Van Assche wrote:
> > Moving these tests into the nvme directory is possible but will make it
> > harder to run the NVMeOF multipath tests separately. Are you fine with this?
> 
> Both way's have it's up and downsides, I agree.
> 
> Having two distinct groups requires to run './check nvme nvmeof-mp' to
> run full coverage with nvme.
> 
> Having it all in one group would require to run './check nvme 18 19 20
> 21 22 23 24 ...' to get only the dm-mpath ones.
> 
> Honestly I hate both but your's (the two distinct groups) is probably
> easier to handle in the end, I have to admit.

Omar, do you have a preference for one of the two aforementioned approaches?

Thanks,

Bart.



Re: blk-mq 5-8 times slower for bmap-tools

2018-08-22 Thread Ben Hutchings
On Thu, 2018-08-23 at 06:02 +0800, Ming Lei wrote:
> On Wed, Aug 22, 2018 at 08:22:00PM +0100, Ben Hutchings wrote:
> > On Mon, 2018-08-20 at 11:04 +0200, Ricardo Ribalda Delgado wrote:
> > > Hello Ming
> > > On Mon, Aug 20, 2018 at 10:30 AM Ming Lei  wrote:
> > 
> > [...]
> > > > One problem found from your iostat log is that looks there is ~30sec
> > > > idle period between IO activities when blk-mq is enabled.
> > > 
> > > During all the test the LED on the device was blinking.
> > > 
> > > But a closer look to dmesg reveals a lot of this messages:
> > > 
> > > [  196.929811] sd 6:0:0:0: [sdb] tag#3 data cmplt err -71 uas-tag 1
> > > inflight: CMD
> > > [  196.929822] sd 6:0:0:0: [sdb] tag#3 CDB: Write(10) 2a 00 00 04 00
> > > 68 00 04 00 00
> > > [  227.764379] sd 6:0:0:0: [sdb] tag#4 uas_eh_abort_handler 0 uas-tag
> > > 2 inflight: CMD OUT
> > > [  227.764389] sd 6:0:0:0: [sdb] tag#4 CDB: Write(10) 2a 00 00 04 04
> > > 68 00 04 00 00
> > > [  227.766555] sd 6:0:0:0: [sdb] tag#3 uas_eh_abort_handler 0 uas-tag
> > > 1 inflight: CMD
> > > [  227.766562] sd 6:0:0:0: [sdb] tag#3 CDB: Write(10) 2a 00 00 04 00
> > > 68 00 04 00 00
> > > [  227.784312] scsi host6: uas_eh_device_reset_handler start
> > > [  227.913672] usb 2-2: reset SuperSpeed USB device number 3 using 
> > > xhci_hcd
> > > [  227.944842] scsi host6: uas_eh_device_reset_handler success
> > > [  231.416133] sd 6:0:0:0: [sdb] tag#1 data cmplt err -71 uas-tag 10
> > > inflight: CMD
> > > [  231.416147] sd 6:0:0:0: [sdb] tag#1 CDB: Write(10) 2a 00 00 06 d5
> > > e8 00 04 00 00
> > > 
> > > And they take around 30 secons (227-196)
> > > 
> > > 
> > > Maybe this is a hw issue? I will bring the reader home tonight and see
> > > if I can replicate the bug with my notebook
> > > 
> > > > 
> > > > Maybe it is related with timeout, given we had big change in v4.17 
> > > > timeout code,
> > > > and we also fixed one scsi_mq timeout related issue recently, and the 
> > > > patch[1] has
> > > > been merged to v4.18 release already.
> > > 
> > > I tried with v4.18-rc4 (latest one packaged in debian experimental)
> > > and after 3 runs, 2 were fine (27 sec), but the last one was over a
> > > minute.
> > 
> > Is it possible that this is fixed by "block: really disable runtime-pm
> > for blk-mq"?
> > 
> 
> As I mentioned, the similar issue can be triggered in both blk-mq and
> non-blk-mq, so it shouldn't be related with runtime PM.
> 
> We need our UAS guys to take a look at this issue.

I saw that you found a problem with UAS in both modes, but Ricardo's
problem seemed to be specific to blk-mq.

Ben.

-- 
Ben Hutchings
Quantity is no substitute for quality, but it's the only one we've got.




signature.asc
Description: This is a digitally signed message part


Re: blk-mq 5-8 times slower for bmap-tools

2018-08-22 Thread Ming Lei
On Wed, Aug 22, 2018 at 08:22:00PM +0100, Ben Hutchings wrote:
> On Mon, 2018-08-20 at 11:04 +0200, Ricardo Ribalda Delgado wrote:
> > Hello Ming
> > On Mon, Aug 20, 2018 at 10:30 AM Ming Lei  wrote:
> [...]
> > > One problem found from your iostat log is that looks there is ~30sec
> > > idle period between IO activities when blk-mq is enabled.
> > 
> > During all the test the LED on the device was blinking.
> > 
> > But a closer look to dmesg reveals a lot of this messages:
> > 
> > [  196.929811] sd 6:0:0:0: [sdb] tag#3 data cmplt err -71 uas-tag 1
> > inflight: CMD
> > [  196.929822] sd 6:0:0:0: [sdb] tag#3 CDB: Write(10) 2a 00 00 04 00
> > 68 00 04 00 00
> > [  227.764379] sd 6:0:0:0: [sdb] tag#4 uas_eh_abort_handler 0 uas-tag
> > 2 inflight: CMD OUT
> > [  227.764389] sd 6:0:0:0: [sdb] tag#4 CDB: Write(10) 2a 00 00 04 04
> > 68 00 04 00 00
> > [  227.766555] sd 6:0:0:0: [sdb] tag#3 uas_eh_abort_handler 0 uas-tag
> > 1 inflight: CMD
> > [  227.766562] sd 6:0:0:0: [sdb] tag#3 CDB: Write(10) 2a 00 00 04 00
> > 68 00 04 00 00
> > [  227.784312] scsi host6: uas_eh_device_reset_handler start
> > [  227.913672] usb 2-2: reset SuperSpeed USB device number 3 using xhci_hcd
> > [  227.944842] scsi host6: uas_eh_device_reset_handler success
> > [  231.416133] sd 6:0:0:0: [sdb] tag#1 data cmplt err -71 uas-tag 10
> > inflight: CMD
> > [  231.416147] sd 6:0:0:0: [sdb] tag#1 CDB: Write(10) 2a 00 00 06 d5
> > e8 00 04 00 00
> > 
> > And they take around 30 secons (227-196)
> > 
> > 
> > Maybe this is a hw issue? I will bring the reader home tonight and see
> > if I can replicate the bug with my notebook
> > 
> > > 
> > > Maybe it is related with timeout, given we had big change in v4.17 
> > > timeout code,
> > > and we also fixed one scsi_mq timeout related issue recently, and the 
> > > patch[1] has
> > > been merged to v4.18 release already.
> > 
> > I tried with v4.18-rc4 (latest one packaged in debian experimental)
> > and after 3 runs, 2 were fine (27 sec), but the last one was over a
> > minute.
> 
> Is it possible that this is fixed by "block: really disable runtime-pm
> for blk-mq"?
> 

As I mentioned, the similar issue can be triggered in both blk-mq and
non-blk-mq, so it shouldn't be related with runtime PM.

We need our UAS guys to take a look at this issue.

Thanks,
Ming


Re: [GIT PULL] Follow up block changes for this merge window

2018-08-22 Thread Jens Axboe
On 8/22/18 2:39 PM, Linus Torvalds wrote:
> On Wed, Aug 22, 2018 at 10:54 AM Jens Axboe  wrote:
>>
>> - Set of bcache fixes and changes (Coly)
> 
> Some of those bcache style fixes look questionable.
> 
> Maybe we should push back on some of the checkpatch rules instead?
> 
> Like having argument names in declarations - sometimes descriptive
> names can be good documentation. And sometimes they are just noise,
> because the type is what describes it.
> 
> Oh well. Taken.

Thanks - fwiw, I don't disagree, and I tend to push back harder on
this in core code than drivers. Especially for newer maintainers,
as it can sometimes be a way to dabble around in the code and
find "real" issues.

-- 
Jens Axboe



Re: [GIT PULL] Follow up block changes for this merge window

2018-08-22 Thread Linus Torvalds
On Wed, Aug 22, 2018 at 10:54 AM Jens Axboe  wrote:
>
> - Set of bcache fixes and changes (Coly)

Some of those bcache style fixes look questionable.

Maybe we should push back on some of the checkpatch rules instead?

Like having argument names in declarations - sometimes descriptive
names can be good documentation. And sometimes they are just noise,
because the type is what describes it.

Oh well. Taken.

   Linus


Re: [PATCH] blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait

2018-08-22 Thread Holger Hoffstätte

On 08/22/18 21:46, Jens Axboe wrote:

On 8/22/18 1:37 PM, Holger Hoffstätte wrote:

On 08/22/18 21:17, Jens Axboe wrote:

So the obvious suspect is the new return of UINT_MAX from get_limit() to
__wbt_wait(). I first suspected that I mispatched something, but it's all
like in mainline or your tree. Even the recently moved-around atomic loop
inside rq_wait_inc_below() is 1:1 the same and looks like it should.
Now building mainline and see where that leads me.


So mainline + your tree's last 4 patches works fine, as suspected.
It's all me, as usual.


That's a relief!


I wonder if it's a signedness thing? Can you try and see if using INT_MAX
instead changes anything?


Beat me to it while I was rebooting ;-)
Exactly what I also found a minute ago:

$diff -rup linux-4.18.4/block/blk-rq-qos.c linux/block/blk-rq-qos.c
..
-bool rq_wait_inc_below(struct rq_wait *rq_wait, int limit)
+bool rq_wait_inc_below(struct rq_wait *rq_wait, unsigned int limit)
..

Moo! Patching now.


At least we have an explanation for why it didn't work.


Luckily I can still read and 22f17952c7 is conveniently called "blk-rq-qos:
make depth comparisons unsigned"! Needless to say with that thrown into the
mix all is good again. Sorry for the confusion & thanks for your patience.

cheers!
Holger


Re: [PATCH] blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait

2018-08-22 Thread Jens Axboe
On 8/22/18 1:37 PM, Holger Hoffstätte wrote:
> On 08/22/18 21:17, Jens Axboe wrote:
>>> So the obvious suspect is the new return of UINT_MAX from get_limit() to
>>> __wbt_wait(). I first suspected that I mispatched something, but it's all
>>> like in mainline or your tree. Even the recently moved-around atomic loop
>>> inside rq_wait_inc_below() is 1:1 the same and looks like it should.
>>> Now building mainline and see where that leads me.
> 
> So mainline + your tree's last 4 patches works fine, as suspected.
> It's all me, as usual.

That's a relief!

>> I wonder if it's a signedness thing? Can you try and see if using INT_MAX
>> instead changes anything?
> 
> Beat me to it while I was rebooting ;-)
> Exactly what I also found a minute ago:
> 
> $diff -rup linux-4.18.4/block/blk-rq-qos.c linux/block/blk-rq-qos.c
> ..
> -bool rq_wait_inc_below(struct rq_wait *rq_wait, int limit)
> +bool rq_wait_inc_below(struct rq_wait *rq_wait, unsigned int limit)
> ..
> 
> Moo! Patching now.

At least we have an explanation for why it didn't work.

-- 
Jens Axboe



Re: [PATCH] blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait

2018-08-22 Thread Holger Hoffstätte

On 08/22/18 21:17, Jens Axboe wrote:

So the obvious suspect is the new return of UINT_MAX from get_limit() to
__wbt_wait(). I first suspected that I mispatched something, but it's all
like in mainline or your tree. Even the recently moved-around atomic loop
inside rq_wait_inc_below() is 1:1 the same and looks like it should.
Now building mainline and see where that leads me.


So mainline + your tree's last 4 patches works fine, as suspected.
It's all me, as usual.


I wonder if it's a signedness thing? Can you try and see if using INT_MAX
instead changes anything?


Beat me to it while I was rebooting ;-)
Exactly what I also found a minute ago:

$diff -rup linux-4.18.4/block/blk-rq-qos.c linux/block/blk-rq-qos.c
..
-bool rq_wait_inc_below(struct rq_wait *rq_wait, int limit)
+bool rq_wait_inc_below(struct rq_wait *rq_wait, unsigned int limit)
..

Moo! Patching now.

cheers
Holger


Re: blk-mq 5-8 times slower for bmap-tools

2018-08-22 Thread Ben Hutchings
On Mon, 2018-08-20 at 11:04 +0200, Ricardo Ribalda Delgado wrote:
> Hello Ming
> On Mon, Aug 20, 2018 at 10:30 AM Ming Lei  wrote:
[...]
> > One problem found from your iostat log is that looks there is ~30sec
> > idle period between IO activities when blk-mq is enabled.
> 
> During all the test the LED on the device was blinking.
> 
> But a closer look to dmesg reveals a lot of this messages:
> 
> [  196.929811] sd 6:0:0:0: [sdb] tag#3 data cmplt err -71 uas-tag 1
> inflight: CMD
> [  196.929822] sd 6:0:0:0: [sdb] tag#3 CDB: Write(10) 2a 00 00 04 00
> 68 00 04 00 00
> [  227.764379] sd 6:0:0:0: [sdb] tag#4 uas_eh_abort_handler 0 uas-tag
> 2 inflight: CMD OUT
> [  227.764389] sd 6:0:0:0: [sdb] tag#4 CDB: Write(10) 2a 00 00 04 04
> 68 00 04 00 00
> [  227.766555] sd 6:0:0:0: [sdb] tag#3 uas_eh_abort_handler 0 uas-tag
> 1 inflight: CMD
> [  227.766562] sd 6:0:0:0: [sdb] tag#3 CDB: Write(10) 2a 00 00 04 00
> 68 00 04 00 00
> [  227.784312] scsi host6: uas_eh_device_reset_handler start
> [  227.913672] usb 2-2: reset SuperSpeed USB device number 3 using xhci_hcd
> [  227.944842] scsi host6: uas_eh_device_reset_handler success
> [  231.416133] sd 6:0:0:0: [sdb] tag#1 data cmplt err -71 uas-tag 10
> inflight: CMD
> [  231.416147] sd 6:0:0:0: [sdb] tag#1 CDB: Write(10) 2a 00 00 06 d5
> e8 00 04 00 00
> 
> And they take around 30 secons (227-196)
> 
> 
> Maybe this is a hw issue? I will bring the reader home tonight and see
> if I can replicate the bug with my notebook
> 
> > 
> > Maybe it is related with timeout, given we had big change in v4.17 timeout 
> > code,
> > and we also fixed one scsi_mq timeout related issue recently, and the 
> > patch[1] has
> > been merged to v4.18 release already.
> 
> I tried with v4.18-rc4 (latest one packaged in debian experimental)
> and after 3 runs, 2 were fine (27 sec), but the last one was over a
> minute.

Is it possible that this is fixed by "block: really disable runtime-pm
for blk-mq"?

That fix went into Debian's version 4.17.14-1, but is not yet in any
4.18-based package.

Ben.

-- 
Ben Hutchings
You can't have everything.  Where would you put it?



signature.asc
Description: This is a digitally signed message part


Re: [PATCH] blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait

2018-08-22 Thread Jens Axboe
On 8/22/18 1:12 PM, Holger Hoffstätte wrote:
> On 08/22/18 19:28, Jens Axboe wrote:
>> On 8/22/18 8:27 AM, Jens Axboe wrote:
>>> On 8/22/18 6:54 AM, Holger Hoffstätte wrote:
 On 08/22/18 06:10, Jens Axboe wrote:
> [...]
> If you have time, please look at the 3 patches I posted earlier today.
> Those are for mainline, so should be OK :-)

 I'm just playing along at home but with those 3 I get repeatable
 hangs & writeback not starting at all, but curiously *only* on my btrfs
 device; for inexplicable reasons some other devices with ext4/xfs flush
 properly. Yes, that surprised me too, but it's repeatable.
 Now this may or may not have something to do with some of my in-testing
 patches for btrfs itself, but if I remove those 3 wbt fixes, everything
 is golden again. Not eager to repeat since it hangs sync & requires a
 hard reboot.. :(
 Just thought you'd like to know.
>>>
>>> Thanks, that's very useful info! I'll see if I can reproduce that.
>>
>> Any chance you can try with and see which patch is causing the issue?
>> I can't reproduce it here, seems solid.
>>
>> Either that, or a reproducer would be great...
> 
> It's a hacked up custom tree but the following things have emerged so far:
> 
> - it's not btrfs.
> 
> - it also happens with ext4.
> 
> - I first suspected bfq on a nonrotational device disabling WBT by default,
> but using deadline didn't help either. Can't even mkfs.ext4.
> 
> - I suspect - but do not know - that using xfs everywhere else is the
> reason I got lucky, because xfs. :D
> 
> - it immediately happens with only the first patch
> ("move disable check into get_limit()")
> 
> So the obvious suspect is the new return of UINT_MAX from get_limit() to
> __wbt_wait(). I first suspected that I mispatched something, but it's all
> like in mainline or your tree. Even the recently moved-around atomic loop
> inside rq_wait_inc_below() is 1:1 the same and looks like it should.
> Now building mainline and see where that leads me.

I wonder if it's a signedness thing? Can you try and see if using INT_MAX
instead changes anything?

-- 
Jens Axboe



Re: [PATCH] blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait

2018-08-22 Thread Holger Hoffstätte

On 08/22/18 19:28, Jens Axboe wrote:

On 8/22/18 8:27 AM, Jens Axboe wrote:

On 8/22/18 6:54 AM, Holger Hoffstätte wrote:

On 08/22/18 06:10, Jens Axboe wrote:

[...]
If you have time, please look at the 3 patches I posted earlier today.
Those are for mainline, so should be OK :-)


I'm just playing along at home but with those 3 I get repeatable
hangs & writeback not starting at all, but curiously *only* on my btrfs
device; for inexplicable reasons some other devices with ext4/xfs flush
properly. Yes, that surprised me too, but it's repeatable.
Now this may or may not have something to do with some of my in-testing
patches for btrfs itself, but if I remove those 3 wbt fixes, everything
is golden again. Not eager to repeat since it hangs sync & requires a
hard reboot.. :(
Just thought you'd like to know.


Thanks, that's very useful info! I'll see if I can reproduce that.


Any chance you can try with and see which patch is causing the issue?
I can't reproduce it here, seems solid.

Either that, or a reproducer would be great...


It's a hacked up custom tree but the following things have emerged so far:

- it's not btrfs.

- it also happens with ext4.

- I first suspected bfq on a nonrotational device disabling WBT by default,
but using deadline didn't help either. Can't even mkfs.ext4.

- I suspect - but do not know - that using xfs everywhere else is the
reason I got lucky, because xfs. :D

- it immediately happens with only the first patch
("move disable check into get_limit()")

So the obvious suspect is the new return of UINT_MAX from get_limit() to
__wbt_wait(). I first suspected that I mispatched something, but it's all
like in mainline or your tree. Even the recently moved-around atomic loop
inside rq_wait_inc_below() is 1:1 the same and looks like it should.
Now building mainline and see where that leads me.

cheers,
Holger


Re: [PATCH v2] bcache: release dc->writeback_lock properly in bch_writeback_thread()

2018-08-22 Thread Jens Axboe
On 8/22/18 12:07 PM, Coly Li wrote:
> On 2018/8/23 2:02 AM, Coly Li wrote:
>> From: Shan Hai 
>>
>> The writeback thread would exit with a lock held when the cache device is
>> detached via sysfs interface, fix it by releasing the held lock before 
>> exiting
>> the while-loop.
>>
>> Fixes: fadd94e05c02 (bcache: quit dc->writeback_thread when 
>> BCACHE_DEV_DETACHING is set)
>> Signed-off-by: Shan Hai 
>> Signed-off-by: Coly Li 
>> Tested-by: Shenghui Wang 
>> Cc: sta...@vger.kernel.org #4.17+
>> ---
>> Changelog:
>> v2: add Fixes tag by Coly Li.
>> v1: initial patch from Shan Hai.
> 
> Hi Jens,
> 
> Please pick this patch for your next pull request. The change from Shan
> Hai fixes a real issue which is reported and lately verified by Shenghui
> Wang.

Applied - but I fixed up the commit message. Please keep lines at
72 characters for git commits.

-- 
Jens Axboe



Re: [PATCH v2] bcache: release dc->writeback_lock properly in bch_writeback_thread()

2018-08-22 Thread Coly Li
On 2018/8/23 2:02 AM, Coly Li wrote:
> From: Shan Hai 
> 
> The writeback thread would exit with a lock held when the cache device is
> detached via sysfs interface, fix it by releasing the held lock before exiting
> the while-loop.
> 
> Fixes: fadd94e05c02 (bcache: quit dc->writeback_thread when 
> BCACHE_DEV_DETACHING is set)
> Signed-off-by: Shan Hai 
> Signed-off-by: Coly Li 
> Tested-by: Shenghui Wang 
> Cc: sta...@vger.kernel.org #4.17+
> ---
> Changelog:
> v2: add Fixes tag by Coly Li.
> v1: initial patch from Shan Hai.

Hi Jens,

Please pick this patch for your next pull request. The change from Shan
Hai fixes a real issue which is reported and lately verified by Shenghui
Wang.

Thanks in advance.

Coly Li

> 
>  drivers/md/bcache/writeback.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 6be05bd7ca67..08c3a9f9676c 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -685,8 +685,10 @@ static int bch_writeback_thread(void *arg)
>* data on cache. BCACHE_DEV_DETACHING flag is set in
>* bch_cached_dev_detach().
>*/
> - if (test_bit(BCACHE_DEV_DETACHING, >disk.flags))
> + if (test_bit(BCACHE_DEV_DETACHING, >disk.flags)) {
> + up_write(>writeback_lock);
>   break;
> + }
>   }
>  
>   up_write(>writeback_lock);
> 



[PATCH v2] bcache: release dc->writeback_lock properly in bch_writeback_thread()

2018-08-22 Thread Coly Li
From: Shan Hai 

The writeback thread would exit with a lock held when the cache device is
detached via sysfs interface, fix it by releasing the held lock before exiting
the while-loop.

Fixes: fadd94e05c02 (bcache: quit dc->writeback_thread when 
BCACHE_DEV_DETACHING is set)
Signed-off-by: Shan Hai 
Signed-off-by: Coly Li 
Tested-by: Shenghui Wang 
Cc: sta...@vger.kernel.org #4.17+
---
Changelog:
v2: add Fixes tag by Coly Li.
v1: initial patch from Shan Hai.

 drivers/md/bcache/writeback.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 6be05bd7ca67..08c3a9f9676c 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -685,8 +685,10 @@ static int bch_writeback_thread(void *arg)
 * data on cache. BCACHE_DEV_DETACHING flag is set in
 * bch_cached_dev_detach().
 */
-   if (test_bit(BCACHE_DEV_DETACHING, >disk.flags))
+   if (test_bit(BCACHE_DEV_DETACHING, >disk.flags)) {
+   up_write(>writeback_lock);
break;
+   }
}
 
up_write(>writeback_lock);
-- 
2.18.0



[GIT PULL] Follow up block changes for this merge window

2018-08-22 Thread Jens Axboe
Hi Linus,

As mentioned in my initial pull, there would be a followup with
some later incoming fixes and changes. This pull request contains:

- Set of bcache fixes and changes (Coly)

- The flush warn fix (me)

- Small series of BFQ fixes (Paolo)

- wbt hang fix (Ming)

- blktrace fix (Steven)

- blk-mq hardware queue count update fix (Jianchao)

- Various little fixes

Please pull!


  git://git.kernel.dk/linux-block.git tags/for-4.19/post-20180822



Chaitanya Kulkarni (1):
  block: remove duplicate initialization

Chengguang Xu (1):
  block: change return type to bool

Colin Ian King (1):
  block/DAC960.c: make some arrays static const, shrinks object size

Coly Li (17):
  bcache: style fix to replace 'unsigned' by 'unsigned int'
  bcache: style fix to add a blank line after declarations
  bcache: add identifier names to arguments of function definitions
  bcache: style fixes for lines over 80 characters
  bcache: replace Symbolic permissions by octal permission numbers
  bcache: replace printk() by pr_*() routines
  bcache: fix indent by replacing blank by tabs
  bcache: replace '%pF' by '%pS' in seq_printf()
  bcache: fix typo 'succesfully' to 'successfully'
  bcache: prefer 'help' in Kconfig
  bcache: do not check NULL pointer before calling kmem_cache_destroy
  bcache: fix code comments style
  bcache: add static const prefix to char * array declarations
  bcache: move open brace at end of function definitions to next line
  bcache: add missing SPDX header
  bcache: remove unnecessary space before ioctl function pointer arguments
  bcache: add the missing comments for smp_mb()/smp_wmb()

Jens Axboe (2):
  block: don't warn for flush on read-only device
  pktcdvd: fix setting of 'ret' error return for a few cases

Jianchao Wang (2):
  blk-mq: init hctx sched after update ctx and hctx mapping
  blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter

Maciej S. Szmigiero (1):
  block, bfq: return nbytes and not zero from struct cftype .write() method

Ming Lei (1):
  blk-wbt: fix IO hang in wbt_wait()

Paolo Valente (4):
  block, bfq: readd missing reset of parent-entity service
  block, bfq: always update the budget of an entity when needed
  block, bfq: reduce write overcharge
  block, bfq: improve code of bfq_bfqq_charge_time

Steven Rostedt (VMware) (1):
  tracing/blktrace: Fix to allow setting same value

 block/bfq-cgroup.c|   3 +-
 block/bfq-iosched.c   |  54 ---
 block/bfq-wf2q.c  |  22 ++---
 block/blk-core.c  |   5 +-
 block/blk-mq-sched.c  |  44 -
 block/blk-mq-sched.h  |   5 -
 block/blk-mq-tag.c|  14 ++-
 block/blk-mq.c|  96 +--
 block/blk-wbt.c   |   6 +-
 block/blk.h   |   4 +-
 block/elevator.c  |  20 ++--
 drivers/block/DAC960.c|  42 +
 drivers/block/pktcdvd.c   |   1 +
 drivers/md/bcache/Kconfig |   6 +-
 drivers/md/bcache/alloc.c |  39 
 drivers/md/bcache/bcache.h| 210 ++
 drivers/md/bcache/bset.c  | 142 +++-
 drivers/md/bcache/bset.h  | 146 -
 drivers/md/bcache/btree.c |  72 +--
 drivers/md/bcache/btree.h |  86 -
 drivers/md/bcache/closure.c   |   6 +-
 drivers/md/bcache/closure.h   |   6 +-
 drivers/md/bcache/debug.c |  23 ++---
 drivers/md/bcache/debug.h |   6 +-
 drivers/md/bcache/extents.c   |  37 
 drivers/md/bcache/extents.h   |   6 +-
 drivers/md/bcache/io.c|  24 ++---
 drivers/md/bcache/journal.c   |  27 +++---
 drivers/md/bcache/journal.h   |  28 +++---
 drivers/md/bcache/movinggc.c  |  14 +--
 drivers/md/bcache/request.c   |  61 ++--
 drivers/md/bcache/request.h   |  18 ++--
 drivers/md/bcache/stats.c |  15 +--
 drivers/md/bcache/stats.h |  15 +--
 drivers/md/bcache/super.c | 107 -
 drivers/md/bcache/sysfs.c |  36 +---
 drivers/md/bcache/sysfs.h |   6 +-
 drivers/md/bcache/util.c  |   2 +
 drivers/md/bcache/util.h  |  24 ++---
 drivers/md/bcache/writeback.c |  30 +++---
 drivers/md/bcache/writeback.h |  19 ++--
 include/uapi/linux/bcache.h   |   8 +-
 kernel/trace/blktrace.c   |   4 +
 43 files changed, 881 insertions(+), 658 deletions(-)

-- 
Jens Axboe



[PATCH] bcache: release dc->writeback_lock properly in bch_writeback_thread()

2018-08-22 Thread Coly Li
From: Shan Hai 

The writeback thread would exit with a lock held when the cache device is
detached via sysfs interface, fix it by releasing the held lock before exiting
the while-loop.

Signed-off-by: Shan Hai 
Signed-off-by: Coly Li 
Tested-by: Shenghui Wang 
Cc: sta...@vger.kernel.org #4.17+
---
 drivers/md/bcache/writeback.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 6be05bd7ca67..08c3a9f9676c 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -685,8 +685,10 @@ static int bch_writeback_thread(void *arg)
 * data on cache. BCACHE_DEV_DETACHING flag is set in
 * bch_cached_dev_detach().
 */
-   if (test_bit(BCACHE_DEV_DETACHING, >disk.flags))
+   if (test_bit(BCACHE_DEV_DETACHING, >disk.flags)) {
+   up_write(>writeback_lock);
break;
+   }
}
 
up_write(>writeback_lock);
-- 
2.18.0



Re: [PATCH] block: don't warn when calling fsync on read-only block devices

2018-08-22 Thread Jens Axboe
On 8/22/18 10:23 AM, Mikulas Patocka wrote:
> It is possible to call fsync on a read-only handle (for example, fsck.ext2 
> does it when doing read-only check), and this call results in kernel 
> warning. This bug was introduced by the commit 721c7fc701c7 "block: fail 
> op_is_write() requests to read-only partitions".

Similar patch is already queued up, it'll go into Linus's tree before
-rc1.

-- 
Jens Axboe



[PATCH] block: don't warn when calling fsync on read-only block devices

2018-08-22 Thread Mikulas Patocka
It is possible to call fsync on a read-only handle (for example, fsck.ext2 
does it when doing read-only check), and this call results in kernel 
warning. This bug was introduced by the commit 721c7fc701c7 "block: fail 
op_is_write() requests to read-only partitions".

Signed-off-by: Mikulas Patocka 
Fixes: 721c7fc701c7 ("block: fail op_is_write() requests to read-only 
partitions")
Cc: sta...@vger.kernel.org  # 4.18

---
 block/blk-core.c  |3 +++
 drivers/md/dm-verity-target.c |   33 +
 2 files changed, 20 insertions(+), 16 deletions(-)

Index: linux-2.6/block/blk-core.c
===
--- linux-2.6.orig/block/blk-core.c 2018-08-15 16:47:18.93000 +0200
+++ linux-2.6/block/blk-core.c  2018-08-22 16:37:19.68000 +0200
@@ -2155,6 +2155,9 @@ static inline bool bio_check_ro(struct b
if (part->policy && op_is_write(bio_op(bio))) {
char b[BDEVNAME_SIZE];
 
+   if (op_is_flush(bio->bi_opf) && !bio_sectors(bio))
+   return false;
+
WARN_ONCE(1,
   "generic_make_request: Trying to write "
"to read-only block-device %s (partno %d)\n",


Re: [PATCH v2] block: fix rdma queue mapping

2018-08-22 Thread Christoph Hellwig
On Mon, Aug 20, 2018 at 01:54:20PM -0700, Sagi Grimberg wrote:
> nvme-rdma attempts to map queues based on irq vector affinity.
> However, for some devices, completion vector irq affinity is
> configurable by the user which can break the existing assumption
> that irq vectors are optimally arranged over the host cpu cores.

IFF affinity is configurable we should never use this code,
as it breaks the model entirely.  ib_get_vector_affinity should
never return a valid mask if affinity is configurable.


Re: [PATCH v5 3/3] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

2018-08-22 Thread Jan Kara
On Wed 22-08-18 18:50:53, Ming Lei wrote:
> On Wed, Aug 22, 2018 at 12:33:05PM +0200, Jan Kara wrote:
> > On Wed 22-08-18 10:02:49, Martin Wilck wrote:
> > > On Mon, 2018-07-30 at 20:37 +0800, Ming Lei wrote:
> > > > On Wed, Jul 25, 2018 at 11:15:09PM +0200, Martin Wilck wrote:
> > > > > 
> > > > > +/**
> > > > > + * bio_iov_iter_get_pages - pin user or kernel pages and add them
> > > > > to a bio
> > > > > + * @bio: bio to add pages to
> > > > > + * @iter: iov iterator describing the region to be mapped
> > > > > + *
> > > > > + * Pins pages from *iter and appends them to @bio's bvec array.
> > > > > The
> > > > > + * pages will have to be released using put_page() when done.
> > > > > + * The function tries, but does not guarantee, to pin as many
> > > > > pages as
> > > > > + * fit into the bio, or are requested in *iter, whatever is
> > > > > smaller.
> > > > > + * If MM encounters an error pinning the requested pages, it
> > > > > stops.
> > > > > + * Error is returned only if 0 pages could be pinned.
> > > > > + */
> > > > > +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> > > > > +{
> > > > > + unsigned short orig_vcnt = bio->bi_vcnt;
> > > > > +
> > > > > + do {
> > > > > + int ret = __bio_iov_iter_get_pages(bio, iter);
> > > > > +
> > > > > + if (unlikely(ret))
> > > > > + return bio->bi_vcnt > orig_vcnt ? 0 : ret;
> > > > > +
> > > > > + } while (iov_iter_count(iter) && !bio_full(bio));
> > > > 
> > > > When 'ret' isn't zero, and some partial progress has been made, seems
> > > > less pages
> > > > might be obtained than requested too. Is that something we need to
> > > > worry about?
> > > 
> > > This would be the case when VM isn't willing or able to fulfill the
> > > page-pinning request. Previously, we came to the conclusion that VM has
> > > the right to do so. This is the reason why callers have to check the
> > > number of pages allocated, and either loop over
> > > bio_iov_iter_get_pages(), or fall back to buffered I/O, until all pages
> > > have been obtained. All callers except the blockdev fast path do the
> > > former. 
> > > 
> > > We could add looping in __blkdev_direct_IO_simple() on top of the
> > > current patch set, to avoid fallback to buffered IO in this corner
> > > case. Should we? If yes, only for WRITEs, or for READs as well?
> > > 
> > > I haven't encountered this situation in my tests, and I'm unsure how to
> > > provoke it - run a direct IO test under high memory pressure?
> > 
> > Currently, iov_iter_get_pages() is always guaranteed to get at least one
> > page as that is current guarantee of get_user_pages() (unless we hit
> > EFAULT obviously). So bio_iov_iter_get_pages() as is now is guaranteed to
> 
> Is it possible for this EFAULT to happen on the user-space VM?

Certainly if the user passes bogus address...

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v5 3/3] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

2018-08-22 Thread Ming Lei
On Wed, Aug 22, 2018 at 12:33:05PM +0200, Jan Kara wrote:
> On Wed 22-08-18 10:02:49, Martin Wilck wrote:
> > On Mon, 2018-07-30 at 20:37 +0800, Ming Lei wrote:
> > > On Wed, Jul 25, 2018 at 11:15:09PM +0200, Martin Wilck wrote:
> > > > 
> > > > +/**
> > > > + * bio_iov_iter_get_pages - pin user or kernel pages and add them
> > > > to a bio
> > > > + * @bio: bio to add pages to
> > > > + * @iter: iov iterator describing the region to be mapped
> > > > + *
> > > > + * Pins pages from *iter and appends them to @bio's bvec array.
> > > > The
> > > > + * pages will have to be released using put_page() when done.
> > > > + * The function tries, but does not guarantee, to pin as many
> > > > pages as
> > > > + * fit into the bio, or are requested in *iter, whatever is
> > > > smaller.
> > > > + * If MM encounters an error pinning the requested pages, it
> > > > stops.
> > > > + * Error is returned only if 0 pages could be pinned.
> > > > + */
> > > > +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> > > > +{
> > > > +   unsigned short orig_vcnt = bio->bi_vcnt;
> > > > +
> > > > +   do {
> > > > +   int ret = __bio_iov_iter_get_pages(bio, iter);
> > > > +
> > > > +   if (unlikely(ret))
> > > > +   return bio->bi_vcnt > orig_vcnt ? 0 : ret;
> > > > +
> > > > +   } while (iov_iter_count(iter) && !bio_full(bio));
> > > 
> > > When 'ret' isn't zero, and some partial progress has been made, seems
> > > less pages
> > > might be obtained than requested too. Is that something we need to
> > > worry about?
> > 
> > This would be the case when VM isn't willing or able to fulfill the
> > page-pinning request. Previously, we came to the conclusion that VM has
> > the right to do so. This is the reason why callers have to check the
> > number of pages allocated, and either loop over
> > bio_iov_iter_get_pages(), or fall back to buffered I/O, until all pages
> > have been obtained. All callers except the blockdev fast path do the
> > former. 
> > 
> > We could add looping in __blkdev_direct_IO_simple() on top of the
> > current patch set, to avoid fallback to buffered IO in this corner
> > case. Should we? If yes, only for WRITEs, or for READs as well?
> > 
> > I haven't encountered this situation in my tests, and I'm unsure how to
> > provoke it - run a direct IO test under high memory pressure?
> 
> Currently, iov_iter_get_pages() is always guaranteed to get at least one
> page as that is current guarantee of get_user_pages() (unless we hit
> EFAULT obviously). So bio_iov_iter_get_pages() as is now is guaranteed to

Is it possible for this EFAULT to happen on the user-space VM?


Thanks,
Ming


Re: [PATCH v5 3/3] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

2018-08-22 Thread Jan Kara
On Wed 22-08-18 10:02:49, Martin Wilck wrote:
> On Mon, 2018-07-30 at 20:37 +0800, Ming Lei wrote:
> > On Wed, Jul 25, 2018 at 11:15:09PM +0200, Martin Wilck wrote:
> > > 
> > > +/**
> > > + * bio_iov_iter_get_pages - pin user or kernel pages and add them
> > > to a bio
> > > + * @bio: bio to add pages to
> > > + * @iter: iov iterator describing the region to be mapped
> > > + *
> > > + * Pins pages from *iter and appends them to @bio's bvec array.
> > > The
> > > + * pages will have to be released using put_page() when done.
> > > + * The function tries, but does not guarantee, to pin as many
> > > pages as
> > > + * fit into the bio, or are requested in *iter, whatever is
> > > smaller.
> > > + * If MM encounters an error pinning the requested pages, it
> > > stops.
> > > + * Error is returned only if 0 pages could be pinned.
> > > + */
> > > +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> > > +{
> > > + unsigned short orig_vcnt = bio->bi_vcnt;
> > > +
> > > + do {
> > > + int ret = __bio_iov_iter_get_pages(bio, iter);
> > > +
> > > + if (unlikely(ret))
> > > + return bio->bi_vcnt > orig_vcnt ? 0 : ret;
> > > +
> > > + } while (iov_iter_count(iter) && !bio_full(bio));
> > 
> > When 'ret' isn't zero, and some partial progress has been made, seems
> > less pages
> > might be obtained than requested too. Is that something we need to
> > worry about?
> 
> This would be the case when VM isn't willing or able to fulfill the
> page-pinning request. Previously, we came to the conclusion that VM has
> the right to do so. This is the reason why callers have to check the
> number of pages allocated, and either loop over
> bio_iov_iter_get_pages(), or fall back to buffered I/O, until all pages
> have been obtained. All callers except the blockdev fast path do the
> former. 
> 
> We could add looping in __blkdev_direct_IO_simple() on top of the
> current patch set, to avoid fallback to buffered IO in this corner
> case. Should we? If yes, only for WRITEs, or for READs as well?
> 
> I haven't encountered this situation in my tests, and I'm unsure how to
> provoke it - run a direct IO test under high memory pressure?

Currently, iov_iter_get_pages() is always guaranteed to get at least one
page as that is current guarantee of get_user_pages() (unless we hit
EFAULT obviously). So bio_iov_iter_get_pages() as is now is guaranteed to
exhaust 'iter' or fill 'bio'. But in the future, the guarantee that
get_user_pages() will always pin at least one page may go away. But we'd
have to audit all users at that time anyway.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: SDHCI Regression with 6ce3dd6eec11 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")

2018-08-22 Thread Ming Lei
On Wed, Aug 22, 2018 at 09:06:40AM +0300, Jarkko Nikula wrote:
> On 08/21/2018 04:57 PM, Ming Lei wrote:
> > On Tue, Aug 21, 2018 at 04:45:41PM +0300, Jarkko Nikula wrote:
> > > On 08/21/2018 04:03 PM, Adrian Hunter wrote:
> > > > On 21/08/18 15:37, Jarkko Nikula wrote:
> > > > > Hi
> > > > > 
> > > > > I bisected some kind of SDHCI regression to commit 6ce3dd6eec11 
> > > > > ("blk-mq:
> > > > > issue directly if hw queue isn't busy in case of 'none'") causing 
> > > > > dumps
> > > > > below and one or more systemd-udevd processes being in 
> > > > > uninterruptible sleep
> > > > > state preventing safe reboot/shutdown.
> > > > > 
> > > > > This is from an Intel Baytrail based tablet with integrated eMMC but 
> > > > > my
> > > > > up-to-date Debian/testing rootfs (with systemd) is on USB stick.
> > > > > 
> > > > > It doesn't revert cleanly on today's head 778a33959a8a but issue is 
> > > > > gone if
> > > > > I go to a commit before 6ce3dd6eec11 and occurs at 6ce3dd6eec11.
> > > > 
> > > > This was discussed here:
> > > > 
> > > > https://marc.info/?l=linux-block=153334717506073=2
> > > > 
> > > > Coincidentally, I just sent the fix patch:
> > > > 
> > > > https://marc.info/?l=linux-mmc=153485326025301=2
> > > > 
> > > Cool, it fixed my regression. I tested both on top of 6ce3dd6eec11 and 
> > > head
> > > 778a33959a8a. Maybe you would like to add into your patch another fixes 
> > > tag
> > > and my tested by:
> > > 
> > > Fixes: 6ce3dd6eec11 ("blk-mq: issue directly if hw queue isn't busy in 
> > > case
> > > of 'none'")
> > 
> > If you read the above links carefully, you'd see it is wrong to add the tag 
> > of
> > 'Fixes: 6ce3dd6eec11'.
> > 
> Hmm... why? That commit 6ce3dd6eec11 was clearly regressing on my setup
> while Adrian's fix for 81196976ed94 that has been present since v4.16 fixes
> my finding too.
> 
> I don't know well enough MMC and block layer but if commit 6ce3dd6eec11
> revealed an issue from MMC under my configuration I'd call it still a
> regression.

6ce3dd6eec11 just makes the original issue to happen easier, and the
root cause is that MMC's stack doesn't support concurrent dispatch for 
SDHCI.

Thanks, 
Ming


Re: [PATCH v5 3/3] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

2018-08-22 Thread Martin Wilck
On Mon, 2018-07-30 at 20:37 +0800, Ming Lei wrote:
> On Wed, Jul 25, 2018 at 11:15:09PM +0200, Martin Wilck wrote:
> > 
> > +/**
> > + * bio_iov_iter_get_pages - pin user or kernel pages and add them
> > to a bio
> > + * @bio: bio to add pages to
> > + * @iter: iov iterator describing the region to be mapped
> > + *
> > + * Pins pages from *iter and appends them to @bio's bvec array.
> > The
> > + * pages will have to be released using put_page() when done.
> > + * The function tries, but does not guarantee, to pin as many
> > pages as
> > + * fit into the bio, or are requested in *iter, whatever is
> > smaller.
> > + * If MM encounters an error pinning the requested pages, it
> > stops.
> > + * Error is returned only if 0 pages could be pinned.
> > + */
> > +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> > +{
> > +   unsigned short orig_vcnt = bio->bi_vcnt;
> > +
> > +   do {
> > +   int ret = __bio_iov_iter_get_pages(bio, iter);
> > +
> > +   if (unlikely(ret))
> > +   return bio->bi_vcnt > orig_vcnt ? 0 : ret;
> > +
> > +   } while (iov_iter_count(iter) && !bio_full(bio));
> 
> When 'ret' isn't zero, and some partial progress has been made, seems
> less pages
> might be obtained than requested too. Is that something we need to
> worry about?

This would be the case when VM isn't willing or able to fulfill the
page-pinning request. Previously, we came to the conclusion that VM has
the right to do so. This is the reason why callers have to check the
number of pages allocated, and either loop over
bio_iov_iter_get_pages(), or fall back to buffered I/O, until all pages
have been obtained. All callers except the blockdev fast path do the
former. 

We could add looping in __blkdev_direct_IO_simple() on top of the
current patch set, to avoid fallback to buffered IO in this corner
case. Should we? If yes, only for WRITEs, or for READs as well?

I haven't encountered this situation in my tests, and I'm unsure how to
provoke it - run a direct IO test under high memory pressure?

Regards,
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)




Re: [PATCH 1/1] bcache: release the lock before stopping the writeback thread

2018-08-22 Thread Shan Hai

Hi Coly,


On 2018年08月22日 15:29, Coly Li wrote:

Hi Shan,

On 2018/8/22 3:00 PM, Shan Hai wrote:

The writeback thread would exit with a lock held when the cache device is
detached via sysfs interface, fix it by releasing the held lock before exiting
the thread.

I will change "the thread" to "the while-loop" when I apply this patch.


That's fine, your change is better.


Signed-off-by: Shan Hai 

Nice catch!

This one should go to stable trees since 4.17 as well. I will CC stable
maintainers when I submit this one for 4.19.


OK, thanks for the quick response.

Regards
Shan Hai


Thank you for the fix.

Coly Li


---
  drivers/md/bcache/writeback.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 481d4cf..adc583f 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -679,8 +679,10 @@ static int bch_writeback_thread(void *arg)
 * data on cache. BCACHE_DEV_DETACHING flag is set in
 * bch_cached_dev_detach().
 */
-   if (test_bit(BCACHE_DEV_DETACHING, >disk.flags))
+   if (test_bit(BCACHE_DEV_DETACHING, >disk.flags)) {
+   up_write(>writeback_lock);
break;
+   }
}
  
  		up_write(>writeback_lock);






Re: [PATCH 1/1] bcache: release the lock before stopping the writeback thread

2018-08-22 Thread Coly Li
Hi Shan,

On 2018/8/22 3:00 PM, Shan Hai wrote:
> The writeback thread would exit with a lock held when the cache device is
> detached via sysfs interface, fix it by releasing the held lock before exiting
> the thread.

I will change "the thread" to "the while-loop" when I apply this patch.

> 
> Signed-off-by: Shan Hai 

Nice catch!

This one should go to stable trees since 4.17 as well. I will CC stable
maintainers when I submit this one for 4.19.

Thank you for the fix.

Coly Li

> ---
>  drivers/md/bcache/writeback.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 481d4cf..adc583f 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -679,8 +679,10 @@ static int bch_writeback_thread(void *arg)
>* data on cache. BCACHE_DEV_DETACHING flag is set in
>* bch_cached_dev_detach().
>*/
> - if (test_bit(BCACHE_DEV_DETACHING, >disk.flags))
> + if (test_bit(BCACHE_DEV_DETACHING, >disk.flags)) {
> + up_write(>writeback_lock);
>   break;
> + }
>   }
>  
>   up_write(>writeback_lock);
> 



[PATCH 1/1] bcache: release the lock before stopping the writeback thread

2018-08-22 Thread Shan Hai
The writeback thread would exit with a lock held when the cache device is
detached via sysfs interface, fix it by releasing the held lock before exiting
the thread.

Signed-off-by: Shan Hai 
---
 drivers/md/bcache/writeback.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 481d4cf..adc583f 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -679,8 +679,10 @@ static int bch_writeback_thread(void *arg)
 * data on cache. BCACHE_DEV_DETACHING flag is set in
 * bch_cached_dev_detach().
 */
-   if (test_bit(BCACHE_DEV_DETACHING, >disk.flags))
+   if (test_bit(BCACHE_DEV_DETACHING, >disk.flags)) {
+   up_write(>writeback_lock);
break;
+   }
}
 
up_write(>writeback_lock);
-- 
2.7.4



Re: SDHCI Regression with 6ce3dd6eec11 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")

2018-08-22 Thread Jarkko Nikula

On 08/21/2018 04:57 PM, Ming Lei wrote:

On Tue, Aug 21, 2018 at 04:45:41PM +0300, Jarkko Nikula wrote:

On 08/21/2018 04:03 PM, Adrian Hunter wrote:

On 21/08/18 15:37, Jarkko Nikula wrote:

Hi

I bisected some kind of SDHCI regression to commit 6ce3dd6eec11 ("blk-mq:
issue directly if hw queue isn't busy in case of 'none'") causing dumps
below and one or more systemd-udevd processes being in uninterruptible sleep
state preventing safe reboot/shutdown.

This is from an Intel Baytrail based tablet with integrated eMMC but my
up-to-date Debian/testing rootfs (with systemd) is on USB stick.

It doesn't revert cleanly on today's head 778a33959a8a but issue is gone if
I go to a commit before 6ce3dd6eec11 and occurs at 6ce3dd6eec11.


This was discussed here:

https://marc.info/?l=linux-block=153334717506073=2

Coincidentally, I just sent the fix patch:

https://marc.info/?l=linux-mmc=153485326025301=2


Cool, it fixed my regression. I tested both on top of 6ce3dd6eec11 and head
778a33959a8a. Maybe you would like to add into your patch another fixes tag
and my tested by:

Fixes: 6ce3dd6eec11 ("blk-mq: issue directly if hw queue isn't busy in case
of 'none'")


If you read the above links carefully, you'd see it is wrong to add the tag of
'Fixes: 6ce3dd6eec11'.

Hmm... why? That commit 6ce3dd6eec11 was clearly regressing on my setup 
while Adrian's fix for 81196976ed94 that has been present since v4.16 
fixes my finding too.


I don't know well enough MMC and block layer but if commit 6ce3dd6eec11 
revealed an issue from MMC under my configuration I'd call it still a 
regression.


--
Jarkko