Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-03-08 Thread Jan Kara
On Tue 07-03-17 08:28:23, Omar Sandoval wrote:
> On Tue, Mar 07, 2017 at 02:57:30PM +0100, Jan Kara wrote:
> > On Mon 06-03-17 12:38:18, Omar Sandoval wrote:
> > > Unfortunately, this patch actually seems to have introduced a
> > > regression. Reverting the patch fixes it.
> > > 
> > > I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
> > > like this:
> > 
> > What workload do you run? Or is it enough to just boot the kvm with
> > virtio-scsi enabled? Can you send me the kvm setup so that I can reproduce
> > this?
> 
> This is just while booting. In fact, you don't even need a virtio-scsi
> disk attached, it's enough to have the virtio-scsi-pci controller. This
> is my exact command line:
> 
> qemu-system-x86_64 -nodefaults -nographic -serial mon:stdio -cpu kvm64 \
>   -enable-kvm -smp 1 -m 2G -watchdog i6300esb \
>   -netdev user,id=vlan0,hostfwd=tcp:127.0.0.1:-:22 \
>   -device virtio-net,netdev=vlan0 \
>   -drive file=Silver/Silver.qcow2,index=0,media=disk,if=virtio,cache=none 
> \
>   -device virtio-scsi-pci \
>   -kernel 
> /home/osandov/linux/builds/virtio-scsi-crash/arch/x86/boot/bzImage \
>   -virtfs 
> local,path=/home/osandov/linux/builds/virtio-scsi-crash,security_model=none,readonly,mount_tag=modules
>  \
>   -append 'root=/dev/vda1 console=ttyS0,115200 scsi_mod.use_blk_mq=y'
> 
> My kernel config is here:
> https://github.com/osandov/osandov-linux/blob/master/configs/qemu.config

Thanks. I was able to reproduce. Do attached two patches help?

Honza
-- 
Jan Kara 
SUSE Labs, CR
>From 60623ea9de6309f7717fc3536e3594b079735af0 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Tue, 7 Mar 2017 18:01:51 +0100
Subject: [PATCH 1/2] block: Allow bdi re-registration

SCSI can call device_add_disk() several times for one request queue when
a device in unbound and bound, creating new gendisk each time. This will
lead to bdi being repeatedly registered and unregistered. This was not a
big problem until commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()" since bdi was only registered repeatedly (bdi_register()
handles repeated calls fine, only we ended up leaking reference to
gendisk due to overwriting bdi->owner) but unregistered only in
blk_cleanup_queue() which didn't get called repeatedly. After
165a5e22fafb we were doing correct bdi_register() - bdi_unregister()
cycles however bdi_unregister() is not prepared for it. So make sure
bdi_unregister() cleans up bdi in such a way that it is prepared for
a possible following bdi_register() call.

An easy way to provoke this behavior is to enable
CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
scsi disk which immediately hangs without this fix.

Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
Signed-off-by: Jan Kara 
---
 mm/backing-dev.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6d861d090e9f..6ac932210f56 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -710,6 +710,11 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 	 */
 	atomic_dec(>usage_cnt);
 	wait_event(cgwb_release_wait, !atomic_read(>usage_cnt));
+	/*
+	 * Grab back our reference so that we hold it when @bdi gets
+	 * re-registered.
+	 */
+	atomic_inc(>usage_cnt);
 }
 
 /**
@@ -857,6 +862,8 @@ int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner)
 			MINOR(owner->devt));
 	if (rc)
 		return rc;
+	/* Leaking owner reference... */
+	WARN_ON(bdi->owner);
 	bdi->owner = owner;
 	get_device(owner);
 	return 0;
-- 
2.10.2

>From b9a6424cabd0eb4a663a1faeb11afef268faebc6 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Wed, 8 Mar 2017 13:24:47 +0100
Subject: [PATCH 2/2] bdi: Fix use-after-free in wb_congested_put()

bdi_writeback_congested structures get created for each blkcg and bdi
regardless whether bdi is registered or not. When they are created in
unregistered bdi and the request queue (and thus bdi) is then destroyed
while blkg still holds reference to bdi_writeback_congested structure,
this structure will be referencing freed bdi and last wb_congested_put()
will try to remove the structure from already freed bdi.

With commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()", SCSI started to destroy bdis without calling
bdi_unregister() first (previously it was calling bdi_unregister() even
for unregistered bdis) and thus the code detaching
bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we
started hitting this use-after-free bug. It is enough to boot a KVM
instance with virtio-scsi device to trigger this behavior.

Fix the problem by detaching bdi_writeback_congested structures in
bdi_exit() instead of bdi_unregister(). This is also more logical as
they can get attached to bdi regardless whether it ever got registered
or not.

Fixes: 

Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-03-07 Thread Omar Sandoval
On Tue, Mar 07, 2017 at 02:57:30PM +0100, Jan Kara wrote:
> On Mon 06-03-17 12:38:18, Omar Sandoval wrote:
> > Unfortunately, this patch actually seems to have introduced a
> > regression. Reverting the patch fixes it.
> > 
> > I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
> > like this:
> 
> What workload do you run? Or is it enough to just boot the kvm with
> virtio-scsi enabled? Can you send me the kvm setup so that I can reproduce
> this?

This is just while booting. In fact, you don't even need a virtio-scsi
disk attached, it's enough to have the virtio-scsi-pci controller. This
is my exact command line:

qemu-system-x86_64 -nodefaults -nographic -serial mon:stdio -cpu kvm64 \
-enable-kvm -smp 1 -m 2G -watchdog i6300esb \
-netdev user,id=vlan0,hostfwd=tcp:127.0.0.1:-:22 \
-device virtio-net,netdev=vlan0 \
-drive file=Silver/Silver.qcow2,index=0,media=disk,if=virtio,cache=none 
\
-device virtio-scsi-pci \
-kernel 
/home/osandov/linux/builds/virtio-scsi-crash/arch/x86/boot/bzImage \
-virtfs 
local,path=/home/osandov/linux/builds/virtio-scsi-crash,security_model=none,readonly,mount_tag=modules
 \
-append 'root=/dev/vda1 console=ttyS0,115200 scsi_mod.use_blk_mq=y'

My kernel config is here:
https://github.com/osandov/osandov-linux/blob/master/configs/qemu.config

> At least the KASAN error could be a result of the situation where someone
> called bdi_register() but didn't call bdi_unregister() before dropping the
> last bdi reference (which would make sense given I've moved
> bdi_unregister() call). However I'd expect a warning from bdi_exit() in that
> case and I don't see that in your kernel log. So I'll try to reproduce the
> issue and debug more.
> 
>   Honza

Thanks for taking a look!


Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-03-07 Thread Jan Kara
On Mon 06-03-17 12:38:18, Omar Sandoval wrote:
> On Wed, Mar 01, 2017 at 04:11:09PM +0100, Jan Kara wrote:
> > On Tue 28-02-17 23:26:53, Omar Sandoval wrote:
> > > On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> > > > On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > > > > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > > > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > this is a second revision of the patch set to fix several 
> > > > > > > different races and
> > > > > > > issues I've found when testing device shutdown and reuse. The 
> > > > > > > first three
> > > > > > > patches are fixes to problems in my previous series fixing BDI 
> > > > > > > lifetime issues.
> > > > > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. 
> > > > > > > With it I cannot
> > > > > > > reproduce the BDI name reuse issues using Omar's stress test 
> > > > > > > using scsi_debug
> > > > > > > so it can be used as a replacement of Dan's patches. Patches 5-11 
> > > > > > > fix oops that
> > > > > > > is triggered by __blkdev_put() calling inode_detach_wb() too 
> > > > > > > early (the problem
> > > > > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in 
> > > > > > > gendisk code
> > > > > > > where get_gendisk() can return already freed gendisk structure 
> > > > > > > (again triggered
> > > > > > > by Omar's stress test).
> > > > > > > 
> > > > > > > People, please have a look at patches. They are mostly simple 
> > > > > > > however the
> > > > > > > interactions are rather complex so I may have missed something. 
> > > > > > > Also I'm
> > > > > > > happy for any additional testing these patches can get - I've 
> > > > > > > stressed them
> > > > > > > with Omar's script, tested memcg writeback, tested static (not 
> > > > > > > udev managed)
> > > > > > > device inodes.
> > > > > > > 
> > > > > > > Jens, I think at least patches 1-3 should go in together with my 
> > > > > > > fixes you
> > > > > > > already have in your tree (or shortly after them). It is up to 
> > > > > > > you whether
> > > > > > > you decide to delay my first fixes or pick these up quickly. 
> > > > > > > Patch 4 is
> > > > > > > (IMHO a cleaner) replacement of Dan's patches so consider whether 
> > > > > > > you want
> > > > > > > to use it instead of those patches.
> > > > > > 
> > > > > > I have applied 1-3 to my for-linus branch, which will go in after
> > > > > > the initial pull request has been pulled by Linus. Consider fixing 
> > > > > > up
> > > > > > #4 so it applies, I like it.
> > > > > 
> > > > > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > > > > already has linux-block changes pulled in. I've left put_disk_devt() 
> > > > > in
> > > > > blk_cleanup_queue() to maintain the logic in the original patch (now 
> > > > > commit
> > > > > 0dba1314d4f8) that request_queue and gendisk each hold one devt 
> > > > > reference.
> > > > > The bdi_unregister() call that is moved to del_gendisk() by this 
> > > > > patch is
> > > > > now protected by the gendisk reference instead of the request_queue 
> > > > > one
> > > > > so it still maintains the property that devt reference protects bdi
> > > > > registration-unregistration lifetime (as much as that is not needed 
> > > > > anymore
> > > > > after this patch).
> > > > > 
> > > > > I have also updated the comment in the code and the changelog - they 
> > > > > were
> > > > > somewhat stale after changes to the whole series Tejun suggested.
> > > > > 
> > > > >   Honza
> > > > 
> > > > Hey, Jan, I just tested this out when I was seeing similar crashes with
> > > > sr instead of sd, and this fixed it.
> > > > 
> > > > Tested-by: Omar Sandoval 
> > > 
> > > Just realized it wasn't clear, I'm talking about patch 4 specifically.
> > 
> > Thanks for confirmation!
> > 
> > Honza
> 
> Unfortunately, this patch actually seems to have introduced a
> regression. Reverting the patch fixes it.
> 
> I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
> like this:

What workload do you run? Or is it enough to just boot the kvm with
virtio-scsi enabled? Can you send me the kvm setup so that I can reproduce
this?

At least the KASAN error could be a result of the situation where someone
called bdi_register() but didn't call bdi_unregister() before dropping the
last bdi reference (which would make sense given I've moved
bdi_unregister() call). However I'd expect a warning from bdi_exit() in that
case and I don't see that in your kernel log. So I'll try to reproduce the
issue and debug more.

Honza

> 
> [6.741773] BUG: unable to handle kernel NULL pointer dereference at 
> 0004
> [6.744401] IP: virtscsi_queuecommand_single+0x21/0x40 

Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-03-01 Thread Tejun Heo
Hello, Jan.

On Wed, Mar 01, 2017 at 04:37:00PM +0100, Jan Kara wrote:
> > The other thing which came to mind is that the congested->__bdi sever
> > semantics.  IIRC, that one was also to support the "bdi must go away now"
> > behavior.  As bdi is refcnted now, I think we can probably just let cong
> > hold onto the bdi rather than try to sever the ref there.
> 
> So currently I get away with __bdi not being a proper refcounted reference.
> If we were to remove the clearing of __bdi, we'd have to make it into
> refcounted reference which is sligthly ugly as we need to special-case
> embedded bdi_writeback_congested structures. Maybe it will be a worthwhile
> cleanup but for now I left it alone...

Yeah, absolutely, it's an additional step that we can take later.
Nothing urgent.

Thanks.

-- 
tejun


Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-03-01 Thread Jan Kara
On Tue 28-02-17 23:26:53, Omar Sandoval wrote:
> On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> > On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > > Hello,
> > > > > 
> > > > > this is a second revision of the patch set to fix several different 
> > > > > races and
> > > > > issues I've found when testing device shutdown and reuse. The first 
> > > > > three
> > > > > patches are fixes to problems in my previous series fixing BDI 
> > > > > lifetime issues.
> > > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With 
> > > > > it I cannot
> > > > > reproduce the BDI name reuse issues using Omar's stress test using 
> > > > > scsi_debug
> > > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix 
> > > > > oops that
> > > > > is triggered by __blkdev_put() calling inode_detach_wb() too early 
> > > > > (the problem
> > > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in 
> > > > > gendisk code
> > > > > where get_gendisk() can return already freed gendisk structure (again 
> > > > > triggered
> > > > > by Omar's stress test).
> > > > > 
> > > > > People, please have a look at patches. They are mostly simple however 
> > > > > the
> > > > > interactions are rather complex so I may have missed something. Also 
> > > > > I'm
> > > > > happy for any additional testing these patches can get - I've 
> > > > > stressed them
> > > > > with Omar's script, tested memcg writeback, tested static (not udev 
> > > > > managed)
> > > > > device inodes.
> > > > > 
> > > > > Jens, I think at least patches 1-3 should go in together with my 
> > > > > fixes you
> > > > > already have in your tree (or shortly after them). It is up to you 
> > > > > whether
> > > > > you decide to delay my first fixes or pick these up quickly. Patch 4 
> > > > > is
> > > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you 
> > > > > want
> > > > > to use it instead of those patches.
> > > > 
> > > > I have applied 1-3 to my for-linus branch, which will go in after
> > > > the initial pull request has been pulled by Linus. Consider fixing up
> > > > #4 so it applies, I like it.
> > > 
> > > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > > already has linux-block changes pulled in. I've left put_disk_devt() in
> > > blk_cleanup_queue() to maintain the logic in the original patch (now 
> > > commit
> > > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > > now protected by the gendisk reference instead of the request_queue one
> > > so it still maintains the property that devt reference protects bdi
> > > registration-unregistration lifetime (as much as that is not needed 
> > > anymore
> > > after this patch).
> > > 
> > > I have also updated the comment in the code and the changelog - they were
> > > somewhat stale after changes to the whole series Tejun suggested.
> > > 
> > >   Honza
> > 
> > Hey, Jan, I just tested this out when I was seeing similar crashes with
> > sr instead of sd, and this fixed it.
> > 
> > Tested-by: Omar Sandoval 
> 
> Just realized it wasn't clear, I'm talking about patch 4 specifically.

Thanks for confirmation!

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-03-01 Thread Jan Kara
Hello,

On Tue 28-02-17 11:54:41, Tejun Heo wrote:
> It generally looks good to me.

Thanks for review!

> The only worry I have is around wb_shutdown() synchronization and if that
> is actually an issue it shouldn't be too difficult to fix.

Yeah, I'll have a look at that.

> The other thing which came to mind is that the congested->__bdi sever
> semantics.  IIRC, that one was also to support the "bdi must go away now"
> behavior.  As bdi is refcnted now, I think we can probably just let cong
> hold onto the bdi rather than try to sever the ref there.

So currently I get away with __bdi not being a proper refcounted reference.
If we were to remove the clearing of __bdi, we'd have to make it into
refcounted reference which is sligthly ugly as we need to special-case
embedded bdi_writeback_congested structures. Maybe it will be a worthwhile
cleanup but for now I left it alone...

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-03-01 Thread Omar Sandoval
On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > Hello,
> > > 
> > > this is a second revision of the patch set to fix several different races 
> > > and
> > > issues I've found when testing device shutdown and reuse. The first three
> > > patches are fixes to problems in my previous series fixing BDI lifetime 
> > > issues.
> > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I 
> > > cannot
> > > reproduce the BDI name reuse issues using Omar's stress test using 
> > > scsi_debug
> > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix 
> > > oops that
> > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the 
> > > problem
> > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk 
> > > code
> > > where get_gendisk() can return already freed gendisk structure (again 
> > > triggered
> > > by Omar's stress test).
> > > 
> > > People, please have a look at patches. They are mostly simple however the
> > > interactions are rather complex so I may have missed something. Also I'm
> > > happy for any additional testing these patches can get - I've stressed 
> > > them
> > > with Omar's script, tested memcg writeback, tested static (not udev 
> > > managed)
> > > device inodes.
> > > 
> > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > already have in your tree (or shortly after them). It is up to you whether
> > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > to use it instead of those patches.
> > 
> > I have applied 1-3 to my for-linus branch, which will go in after
> > the initial pull request has been pulled by Linus. Consider fixing up
> > #4 so it applies, I like it.
> 
> OK, attached is patch 4 rebased on top of Linus' tree from today which
> already has linux-block changes pulled in. I've left put_disk_devt() in
> blk_cleanup_queue() to maintain the logic in the original patch (now commit
> 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> The bdi_unregister() call that is moved to del_gendisk() by this patch is
> now protected by the gendisk reference instead of the request_queue one
> so it still maintains the property that devt reference protects bdi
> registration-unregistration lifetime (as much as that is not needed anymore
> after this patch).
> 
> I have also updated the comment in the code and the changelog - they were
> somewhat stale after changes to the whole series Tejun suggested.
> 
>   Honza

Hey, Jan, I just tested this out when I was seeing similar crashes with
sr instead of sd, and this fixed it.

Tested-by: Omar Sandoval 


Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-02-28 Thread Omar Sandoval
On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > Hello,
> > > > 
> > > > this is a second revision of the patch set to fix several different 
> > > > races and
> > > > issues I've found when testing device shutdown and reuse. The first 
> > > > three
> > > > patches are fixes to problems in my previous series fixing BDI lifetime 
> > > > issues.
> > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it 
> > > > I cannot
> > > > reproduce the BDI name reuse issues using Omar's stress test using 
> > > > scsi_debug
> > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix 
> > > > oops that
> > > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the 
> > > > problem
> > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk 
> > > > code
> > > > where get_gendisk() can return already freed gendisk structure (again 
> > > > triggered
> > > > by Omar's stress test).
> > > > 
> > > > People, please have a look at patches. They are mostly simple however 
> > > > the
> > > > interactions are rather complex so I may have missed something. Also I'm
> > > > happy for any additional testing these patches can get - I've stressed 
> > > > them
> > > > with Omar's script, tested memcg writeback, tested static (not udev 
> > > > managed)
> > > > device inodes.
> > > > 
> > > > Jens, I think at least patches 1-3 should go in together with my fixes 
> > > > you
> > > > already have in your tree (or shortly after them). It is up to you 
> > > > whether
> > > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you 
> > > > want
> > > > to use it instead of those patches.
> > > 
> > > I have applied 1-3 to my for-linus branch, which will go in after
> > > the initial pull request has been pulled by Linus. Consider fixing up
> > > #4 so it applies, I like it.
> > 
> > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > already has linux-block changes pulled in. I've left put_disk_devt() in
> > blk_cleanup_queue() to maintain the logic in the original patch (now commit
> > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > now protected by the gendisk reference instead of the request_queue one
> > so it still maintains the property that devt reference protects bdi
> > registration-unregistration lifetime (as much as that is not needed anymore
> > after this patch).
> > 
> > I have also updated the comment in the code and the changelog - they were
> > somewhat stale after changes to the whole series Tejun suggested.
> > 
> > Honza
> 
> Hey, Jan, I just tested this out when I was seeing similar crashes with
> sr instead of sd, and this fixed it.
> 
> Tested-by: Omar Sandoval 

Just realized it wasn't clear, I'm talking about patch 4 specifically.


Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-02-28 Thread Tejun Heo
Hello,

It generally looks good to me.  The only worry I have is around
wb_shutdown() synchronization and if that is actually an issue it
shouldn't be too difficult to fix.  The other thing which came to mind
is that the congested->__bdi sever semantics.  IIRC, that one was also
to support the "bdi must go away now" behavior.  As bdi is refcnted
now, I think we can probably just let cong hold onto the bdi rather
than try to sever the ref there.

Thanks a lot for working on this!

-- 
tejun


Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-02-22 Thread Jan Kara
On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> On 02/21/2017 10:09 AM, Jan Kara wrote:
> > Hello,
> > 
> > this is a second revision of the patch set to fix several different races 
> > and
> > issues I've found when testing device shutdown and reuse. The first three
> > patches are fixes to problems in my previous series fixing BDI lifetime 
> > issues.
> > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I 
> > cannot
> > reproduce the BDI name reuse issues using Omar's stress test using 
> > scsi_debug
> > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops 
> > that
> > is triggered by __blkdev_put() calling inode_detach_wb() too early (the 
> > problem
> > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > where get_gendisk() can return already freed gendisk structure (again 
> > triggered
> > by Omar's stress test).
> > 
> > People, please have a look at patches. They are mostly simple however the
> > interactions are rather complex so I may have missed something. Also I'm
> > happy for any additional testing these patches can get - I've stressed them
> > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > device inodes.
> > 
> > Jens, I think at least patches 1-3 should go in together with my fixes you
> > already have in your tree (or shortly after them). It is up to you whether
> > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > to use it instead of those patches.
> 
> I have applied 1-3 to my for-linus branch, which will go in after
> the initial pull request has been pulled by Linus. Consider fixing up
> #4 so it applies, I like it.

OK, attached is patch 4 rebased on top of Linus' tree from today which
already has linux-block changes pulled in. I've left put_disk_devt() in
blk_cleanup_queue() to maintain the logic in the original patch (now commit
0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
The bdi_unregister() call that is moved to del_gendisk() by this patch is
now protected by the gendisk reference instead of the request_queue one
so it still maintains the property that devt reference protects bdi
registration-unregistration lifetime (as much as that is not needed anymore
after this patch).

I have also updated the comment in the code and the changelog - they were
somewhat stale after changes to the whole series Tejun suggested.

Honza
-- 
Jan Kara 
SUSE Labs, CR
>From 9abe9565c83af6b653b159a7bf5b895aff638c65 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Wed, 8 Feb 2017 08:05:56 +0100
Subject: [PATCH] block: Move bdi_unregister() to del_gendisk()

Commit 6cd18e711dd8 "block: destroy bdi before blockdev is
unregistered." moved bdi unregistration (at that time through
bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because
it needs to happen before blk_unregister_region() call in del_gendisk()
for MD. SCSI though will free up the device number from sd_remove()
called through a maze of callbacks from device_del() in
__scsi_remove_device() before blk_cleanup_queue() and thus similar races
as described in 6cd18e711dd8 can happen for SCSI as well as reported by
Omar [1].

Moving bdi_unregister() to del_gendisk() works for MD and fixes the
problem for SCSI since del_gendisk() gets called from sd_remove() before
freeing the device number.

This also makes device_add_disk() (calling bdi_register_owner()) more
symmetric with del_gendisk().

[1] http://marc.info/?l=linux-block=148554717109098=2

Tested-by: Lekshmi Pillai 
Acked-by: Tejun Heo 
Signed-off-by: Jan Kara 
---
 block/blk-core.c | 1 -
 block/genhd.c| 5 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b9e857f4afe8..1086dac8724c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -578,7 +578,6 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = >__queue_lock;
 	spin_unlock_irq(lock);
 
-	bdi_unregister(q->backing_dev_info);
 	put_disk_devt(q->disk_devt);
 
 	/* @q is and will stay empty, shutdown and put */
diff --git a/block/genhd.c b/block/genhd.c
index 2f444b87a5f2..b26a5ea115d0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -681,6 +681,11 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(_to_dev(disk)->kobj, "bdi");
+	/*
+	 * Unregister bdi before releasing device numbers (as they can get
+	 * reused and we'd get clashes in sysfs).
+	 */
+	bdi_unregister(disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
-- 
2.10.2



Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-02-21 Thread Dan Williams
On Tue, Feb 21, 2017 at 9:19 AM, Jan Kara  wrote:
> On Tue 21-02-17 18:09:45, Jan Kara wrote:
>> Hello,
>>
>> this is a second revision of the patch set to fix several different races and
>> issues I've found when testing device shutdown and reuse. The first three
>> patches are fixes to problems in my previous series fixing BDI lifetime 
>> issues.
>> Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I 
>> cannot
>> reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
>> so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops 
>> that
>> is triggered by __blkdev_put() calling inode_detach_wb() too early (the 
>> problem
>> reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
>> where get_gendisk() can return already freed gendisk structure (again 
>> triggered
>> by Omar's stress test).
>>
>> People, please have a look at patches. They are mostly simple however the
>> interactions are rather complex so I may have missed something. Also I'm
>> happy for any additional testing these patches can get - I've stressed them
>> with Omar's script, tested memcg writeback, tested static (not udev managed)
>> device inodes.
>>
>> Jens, I think at least patches 1-3 should go in together with my fixes you
>> already have in your tree (or shortly after them). It is up to you whether
>> you decide to delay my first fixes or pick these up quickly. Patch 4 is
>> (IMHO a cleaner) replacement of Dan's patches so consider whether you want
>> to use it instead of those patches.

FWIW, I wholeheartedly agree with replacing my band-aid with this deeper fix.


Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-02-21 Thread Jens Axboe
On 02/21/2017 10:09 AM, Jan Kara wrote:
> Hello,
> 
> this is a second revision of the patch set to fix several different races and
> issues I've found when testing device shutdown and reuse. The first three
> patches are fixes to problems in my previous series fixing BDI lifetime 
> issues.
> Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I 
> cannot
> reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops 
> that
> is triggered by __blkdev_put() calling inode_detach_wb() too early (the 
> problem
> reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> where get_gendisk() can return already freed gendisk structure (again 
> triggered
> by Omar's stress test).
> 
> People, please have a look at patches. They are mostly simple however the
> interactions are rather complex so I may have missed something. Also I'm
> happy for any additional testing these patches can get - I've stressed them
> with Omar's script, tested memcg writeback, tested static (not udev managed)
> device inodes.
> 
> Jens, I think at least patches 1-3 should go in together with my fixes you
> already have in your tree (or shortly after them). It is up to you whether
> you decide to delay my first fixes or pick these up quickly. Patch 4 is
> (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> to use it instead of those patches.

I have applied 1-3 to my for-linus branch, which will go in after
the initial pull request has been pulled by Linus. Consider fixing up
#4 so it applies, I like it.

I'll review the rest later this week.

-- 
Jens Axboe



[PATCH 0/13 v2] block: Fix block device shutdown related races

2017-02-21 Thread Jan Kara
Hello,

this is a second revision of the patch set to fix several different races and
issues I've found when testing device shutdown and reuse. The first three
patches are fixes to problems in my previous series fixing BDI lifetime issues.
Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
where get_gendisk() can return already freed gendisk structure (again triggered
by Omar's stress test).

People, please have a look at patches. They are mostly simple however the
interactions are rather complex so I may have missed something. Also I'm
happy for any additional testing these patches can get - I've stressed them
with Omar's script, tested memcg writeback, tested static (not udev managed)
device inodes.

Jens, I think at least patches 1-3 should go in together with my fixes you
already have in your tree (or shortly after them). It is up to you whether
you decide to delay my first fixes or pick these up quickly. Patch 4 is
(IMHO a cleaner) replacement of Dan's patches so consider whether you want
to use it instead of those patches.

Changes since v1:
* Added Acks and Tested-by tags for patches in areas that did not change
* Reworked inode_detach_wb() related fixes based on Tejun's feedback

Honza