Re: [PATCH block/for-linus] blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt an atomic_t
On Thu, 19 Jun 2014 17:42:57 -0400 Tejun Heo wrote: > Hello, > > So, this patch should do. Joe, Vivek, can one of you guys please > verify that the oops goes away with this patch? Tejun -- thanks for fixing! Looks good here, no issues running w/slub debug enabled. -- Joe > Jens, the original thread can be read at > > http://thread.gmane.org/gmane.linux.kernel/1720729 > > The fix converts blkg->refcnt from int to atomic_t. It does some > overhead but it should be minute compared to everything else which is > going on and the involved cacheline bouncing, so I think it's highly > unlikely to cause any noticeable difference. Also, the refcnt in > question should be converted to a perpcu_ref for blk-mq anyway, so the > atomic_t is likely to go away pretty soon anyway. > > Thanks. > > --- 8< --- > __blkg_release_rcu() may be invoked after the associated request_queue > is released with a RCU grace period inbetween. As such, the function > and callbacks invoked from it must not dereference the associated > request_queue. This is clearly indicated in the comment above the > function. > > Unfortunately, while trying to fix a different issue, 2a4fd070ee85 > ("blkcg: move bulk of blkcg_gq release operations to the RCU > callback") ignored this and added [un]locking of @blkg->q->queue_lock > to __blkg_release_rcu(). This of course can cause oops as the > request_queue may be long gone by the time this code gets executed. > > general protection fault: [#1] SMP > CPU: 21 PID: 30 Comm: rcuos/21 Not tainted 3.15.0 #1 > Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.3:57 > 12/25/2013 > task: 880854021de0 ti: 88085403c000 task.ti: 88085403c000 > RIP: 0010:[] [] > _raw_spin_lock_irq+0x15/0x60 > RSP: 0018:88085403fdf0 EFLAGS: 00010086 > RAX: 0002 RBX: 0010 RCX: > RDX: 60ef80008248 RSI: 0286 RDI: 6b6b6b6b6b6b6b6b > RBP: 88085403fdf0 R08: 0286 R09: 9f39 > R10: 00020001 R11: 00020001 R12: 88103c17a130 > R13: 88103c17a080 R14: R15: > FS: () GS:88107fca() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 006e5ab8 CR3: 0193d000 CR4: 000407e0 > Stack: >88085403fe18 812cbfc2 88103c17a130 >88103c17a130 88085403fec0 810d1d28 880854021de0 >880854021de0 88107fcaec58 88085403fe80 88107fcaec30 > Call Trace: >[] __blkg_release_rcu+0x72/0x150 >[] rcu_nocb_kthread+0x1e8/0x300 >[] kthread+0xe1/0x100 >[] ret_from_fork+0x7c/0xb0 > Code: ff 47 04 48 8b 7d 08 be 00 02 00 00 e8 55 48 a4 ff 5d c3 0f 1f 00 66 > 66 66 66 90 55 48 89 e5 > +fa 66 66 90 66 66 90 b8 00 00 02 00 0f c1 07 89 c2 c1 ea 10 66 39 c2 > 75 02 5d c3 83 e2 fe 0f > +b7 > RIP [] _raw_spin_lock_irq+0x15/0x60 >RSP > > The request_queue locking was added because blkcg_gq->refcnt is an int > protected with the queue lock and __blkg_release_rcu() needs to put > the parent. Let's fix it by making blkcg_gq->refcnt an atomic_t and > dropping queue locking in the function. > > Given the general heavy weight of the current request_queue and blkcg > operations, this is unlikely to cause any noticeable overhead. > Moreover, blkcg_gq->refcnt is likely to be converted to percpu_ref in > the near future, so whatever (most likely negligible) overhead it may > add is temporary. > > Signed-off-by: Tejun Heo > Reported-by: Joe Lawrence > Cc: Vivek Goyal > Link: > http://lkml.kernel.org/g/alpine.deb.2.02.1406081816540.17...@jlaw-desktop.mno.stratus.com > Cc: sta...@vger.kernel.org > --- > block/blk-cgroup.c |7 ++- > block/blk-cgroup.h | 17 +++-- > 2 files changed, 9 insertions(+), 15 deletions(-) > > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -80,7 +80,7 @@ static struct blkcg_gq *blkg_alloc(struc > blkg->q = q; > INIT_LIST_HEAD(>q_node); > blkg->blkcg = blkcg; > - blkg->refcnt = 1; > + atomic_set(>refcnt, 1); > > /* root blkg uses @q->root_rl, init rl only for !root blkgs */ > if (blkcg != _root) { > @@ -399,11 +399,8 @@ void __blkg_release_rcu(struct rcu_head > > /* release the blkcg and parent blkg refs this blkg has been holding */ > css_put(>blkcg->css); > - if (blkg->parent) { > - spin_lock_irq(blkg->q->queue_lock); > + if (blkg->parent) > blkg_put(blkg->parent); > - spin_unlock_irq(blkg->q->queue_lock); > - } > > blkg_free(blkg); > } > --- a/block/blk-cgroup.h > +++ b/block/blk-cgroup.h > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > /* Max limits for throttle policy */ > #define THROTL_IOPS_MAX UINT_MAX > @@ -104,7 +105,7 @@ struct blkcg_gq { >
Re: [PATCH block/for-linus] blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt an atomic_t
On 06/20/2014 08:39 AM, Vivek Goyal wrote: > On Thu, Jun 19, 2014 at 05:42:57PM -0400, Tejun Heo wrote: >> Hello, >> >> So, this patch should do. Joe, Vivek, can one of you guys please >> verify that the oops goes away with this patch? > > Hi Tejun, > > This patch seems to fix the issue for me. Tried 10 times and no crash. > > So now one need to hold queue lock for getting refernce on the group > only if caller does not already have a reference and if group has been > looked up from some tree/queue etc. I guess only such usage seems to > be in blkg_create() where we take a reference on parent after looking > it up. > > This patch looks good to me. > > Acked-by: Vivek Goyal Thanks. Tejun, I'll queue this up for this cycle. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt an atomic_t
On Thu, Jun 19, 2014 at 05:42:57PM -0400, Tejun Heo wrote: > Hello, > > So, this patch should do. Joe, Vivek, can one of you guys please > verify that the oops goes away with this patch? Hi Tejun, This patch seems to fix the issue for me. Tried 10 times and no crash. So now one need to hold queue lock for getting refernce on the group only if caller does not already have a reference and if group has been looked up from some tree/queue etc. I guess only such usage seems to be in blkg_create() where we take a reference on parent after looking it up. This patch looks good to me. Acked-by: Vivek Goyal Thanks Vivek > > Jens, the original thread can be read at > > http://thread.gmane.org/gmane.linux.kernel/1720729 > > The fix converts blkg->refcnt from int to atomic_t. It does some > overhead but it should be minute compared to everything else which is > going on and the involved cacheline bouncing, so I think it's highly > unlikely to cause any noticeable difference. Also, the refcnt in > question should be converted to a perpcu_ref for blk-mq anyway, so the > atomic_t is likely to go away pretty soon anyway. > > Thanks. > > --- 8< --- > __blkg_release_rcu() may be invoked after the associated request_queue > is released with a RCU grace period inbetween. As such, the function > and callbacks invoked from it must not dereference the associated > request_queue. This is clearly indicated in the comment above the > function. > > Unfortunately, while trying to fix a different issue, 2a4fd070ee85 > ("blkcg: move bulk of blkcg_gq release operations to the RCU > callback") ignored this and added [un]locking of @blkg->q->queue_lock > to __blkg_release_rcu(). This of course can cause oops as the > request_queue may be long gone by the time this code gets executed. > > general protection fault: [#1] SMP > CPU: 21 PID: 30 Comm: rcuos/21 Not tainted 3.15.0 #1 > Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.3:57 > 12/25/2013 > task: 880854021de0 ti: 88085403c000 task.ti: 88085403c000 > RIP: 0010:[] [] > _raw_spin_lock_irq+0x15/0x60 > RSP: 0018:88085403fdf0 EFLAGS: 00010086 > RAX: 0002 RBX: 0010 RCX: > RDX: 60ef80008248 RSI: 0286 RDI: 6b6b6b6b6b6b6b6b > RBP: 88085403fdf0 R08: 0286 R09: 9f39 > R10: 00020001 R11: 00020001 R12: 88103c17a130 > R13: 88103c17a080 R14: R15: > FS: () GS:88107fca() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 006e5ab8 CR3: 0193d000 CR4: 000407e0 > Stack: >88085403fe18 812cbfc2 88103c17a130 >88103c17a130 88085403fec0 810d1d28 880854021de0 >880854021de0 88107fcaec58 88085403fe80 88107fcaec30 > Call Trace: >[] __blkg_release_rcu+0x72/0x150 >[] rcu_nocb_kthread+0x1e8/0x300 >[] kthread+0xe1/0x100 >[] ret_from_fork+0x7c/0xb0 > Code: ff 47 04 48 8b 7d 08 be 00 02 00 00 e8 55 48 a4 ff 5d c3 0f 1f 00 66 > 66 66 66 90 55 48 89 e5 > +fa 66 66 90 66 66 90 b8 00 00 02 00 0f c1 07 89 c2 c1 ea 10 66 39 c2 > 75 02 5d c3 83 e2 fe 0f > +b7 > RIP [] _raw_spin_lock_irq+0x15/0x60 >RSP > > The request_queue locking was added because blkcg_gq->refcnt is an int > protected with the queue lock and __blkg_release_rcu() needs to put > the parent. Let's fix it by making blkcg_gq->refcnt an atomic_t and > dropping queue locking in the function. > > Given the general heavy weight of the current request_queue and blkcg > operations, this is unlikely to cause any noticeable overhead. > Moreover, blkcg_gq->refcnt is likely to be converted to percpu_ref in > the near future, so whatever (most likely negligible) overhead it may > add is temporary. > > Signed-off-by: Tejun Heo > Reported-by: Joe Lawrence > Cc: Vivek Goyal > Link: > http://lkml.kernel.org/g/alpine.deb.2.02.1406081816540.17...@jlaw-desktop.mno.stratus.com > Cc: sta...@vger.kernel.org > --- > block/blk-cgroup.c |7 ++- > block/blk-cgroup.h | 17 +++-- > 2 files changed, 9 insertions(+), 15 deletions(-) > > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -80,7 +80,7 @@ static struct blkcg_gq *blkg_alloc(struc > blkg->q = q; > INIT_LIST_HEAD(>q_node); > blkg->blkcg = blkcg; > - blkg->refcnt = 1; > + atomic_set(>refcnt, 1); > > /* root blkg uses @q->root_rl, init rl only for !root blkgs */ > if (blkcg != _root) { > @@ -399,11 +399,8 @@ void __blkg_release_rcu(struct rcu_head > > /* release the blkcg and parent blkg refs this blkg has been holding */ > css_put(>blkcg->css); > - if (blkg->parent) { > - spin_lock_irq(blkg->q->queue_lock); > + if (blkg->parent) > blkg_put(blkg->parent); > -
Re: [PATCH block/for-linus] blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt an atomic_t
On Thu, Jun 19, 2014 at 05:42:57PM -0400, Tejun Heo wrote: Hello, So, this patch should do. Joe, Vivek, can one of you guys please verify that the oops goes away with this patch? Hi Tejun, This patch seems to fix the issue for me. Tried 10 times and no crash. So now one need to hold queue lock for getting refernce on the group only if caller does not already have a reference and if group has been looked up from some tree/queue etc. I guess only such usage seems to be in blkg_create() where we take a reference on parent after looking it up. This patch looks good to me. Acked-by: Vivek Goyal vgo...@redhat.com Thanks Vivek Jens, the original thread can be read at http://thread.gmane.org/gmane.linux.kernel/1720729 The fix converts blkg-refcnt from int to atomic_t. It does some overhead but it should be minute compared to everything else which is going on and the involved cacheline bouncing, so I think it's highly unlikely to cause any noticeable difference. Also, the refcnt in question should be converted to a perpcu_ref for blk-mq anyway, so the atomic_t is likely to go away pretty soon anyway. Thanks. --- 8 --- __blkg_release_rcu() may be invoked after the associated request_queue is released with a RCU grace period inbetween. As such, the function and callbacks invoked from it must not dereference the associated request_queue. This is clearly indicated in the comment above the function. Unfortunately, while trying to fix a different issue, 2a4fd070ee85 (blkcg: move bulk of blkcg_gq release operations to the RCU callback) ignored this and added [un]locking of @blkg-q-queue_lock to __blkg_release_rcu(). This of course can cause oops as the request_queue may be long gone by the time this code gets executed. general protection fault: [#1] SMP CPU: 21 PID: 30 Comm: rcuos/21 Not tainted 3.15.0 #1 Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.3:57 12/25/2013 task: 880854021de0 ti: 88085403c000 task.ti: 88085403c000 RIP: 0010:[8162e9e5] [8162e9e5] _raw_spin_lock_irq+0x15/0x60 RSP: 0018:88085403fdf0 EFLAGS: 00010086 RAX: 0002 RBX: 0010 RCX: RDX: 60ef80008248 RSI: 0286 RDI: 6b6b6b6b6b6b6b6b RBP: 88085403fdf0 R08: 0286 R09: 9f39 R10: 00020001 R11: 00020001 R12: 88103c17a130 R13: 88103c17a080 R14: R15: FS: () GS:88107fca() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 006e5ab8 CR3: 0193d000 CR4: 000407e0 Stack: 88085403fe18 812cbfc2 88103c17a130 88103c17a130 88085403fec0 810d1d28 880854021de0 880854021de0 88107fcaec58 88085403fe80 88107fcaec30 Call Trace: [812cbfc2] __blkg_release_rcu+0x72/0x150 [810d1d28] rcu_nocb_kthread+0x1e8/0x300 [81091d81] kthread+0xe1/0x100 [8163813c] ret_from_fork+0x7c/0xb0 Code: ff 47 04 48 8b 7d 08 be 00 02 00 00 e8 55 48 a4 ff 5d c3 0f 1f 00 66 66 66 66 90 55 48 89 e5 +fa 66 66 90 66 66 90 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f +b7 RIP [8162e9e5] _raw_spin_lock_irq+0x15/0x60 RSP 88085403fdf0 The request_queue locking was added because blkcg_gq-refcnt is an int protected with the queue lock and __blkg_release_rcu() needs to put the parent. Let's fix it by making blkcg_gq-refcnt an atomic_t and dropping queue locking in the function. Given the general heavy weight of the current request_queue and blkcg operations, this is unlikely to cause any noticeable overhead. Moreover, blkcg_gq-refcnt is likely to be converted to percpu_ref in the near future, so whatever (most likely negligible) overhead it may add is temporary. Signed-off-by: Tejun Heo t...@kernel.org Reported-by: Joe Lawrence joe.lawre...@stratus.com Cc: Vivek Goyal vgo...@redhat.com Link: http://lkml.kernel.org/g/alpine.deb.2.02.1406081816540.17...@jlaw-desktop.mno.stratus.com Cc: sta...@vger.kernel.org --- block/blk-cgroup.c |7 ++- block/blk-cgroup.h | 17 +++-- 2 files changed, 9 insertions(+), 15 deletions(-) --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -80,7 +80,7 @@ static struct blkcg_gq *blkg_alloc(struc blkg-q = q; INIT_LIST_HEAD(blkg-q_node); blkg-blkcg = blkcg; - blkg-refcnt = 1; + atomic_set(blkg-refcnt, 1); /* root blkg uses @q-root_rl, init rl only for !root blkgs */ if (blkcg != blkcg_root) { @@ -399,11 +399,8 @@ void __blkg_release_rcu(struct rcu_head /* release the blkcg and parent blkg refs this blkg has been holding */ css_put(blkg-blkcg-css); - if (blkg-parent) { -
Re: [PATCH block/for-linus] blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt an atomic_t
On 06/20/2014 08:39 AM, Vivek Goyal wrote: On Thu, Jun 19, 2014 at 05:42:57PM -0400, Tejun Heo wrote: Hello, So, this patch should do. Joe, Vivek, can one of you guys please verify that the oops goes away with this patch? Hi Tejun, This patch seems to fix the issue for me. Tried 10 times and no crash. So now one need to hold queue lock for getting refernce on the group only if caller does not already have a reference and if group has been looked up from some tree/queue etc. I guess only such usage seems to be in blkg_create() where we take a reference on parent after looking it up. This patch looks good to me. Acked-by: Vivek Goyal vgo...@redhat.com Thanks. Tejun, I'll queue this up for this cycle. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH block/for-linus] blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt an atomic_t
On Thu, 19 Jun 2014 17:42:57 -0400 Tejun Heo t...@kernel.org wrote: Hello, So, this patch should do. Joe, Vivek, can one of you guys please verify that the oops goes away with this patch? Tejun -- thanks for fixing! Looks good here, no issues running w/slub debug enabled. -- Joe Jens, the original thread can be read at http://thread.gmane.org/gmane.linux.kernel/1720729 The fix converts blkg-refcnt from int to atomic_t. It does some overhead but it should be minute compared to everything else which is going on and the involved cacheline bouncing, so I think it's highly unlikely to cause any noticeable difference. Also, the refcnt in question should be converted to a perpcu_ref for blk-mq anyway, so the atomic_t is likely to go away pretty soon anyway. Thanks. --- 8 --- __blkg_release_rcu() may be invoked after the associated request_queue is released with a RCU grace period inbetween. As such, the function and callbacks invoked from it must not dereference the associated request_queue. This is clearly indicated in the comment above the function. Unfortunately, while trying to fix a different issue, 2a4fd070ee85 (blkcg: move bulk of blkcg_gq release operations to the RCU callback) ignored this and added [un]locking of @blkg-q-queue_lock to __blkg_release_rcu(). This of course can cause oops as the request_queue may be long gone by the time this code gets executed. general protection fault: [#1] SMP CPU: 21 PID: 30 Comm: rcuos/21 Not tainted 3.15.0 #1 Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.3:57 12/25/2013 task: 880854021de0 ti: 88085403c000 task.ti: 88085403c000 RIP: 0010:[8162e9e5] [8162e9e5] _raw_spin_lock_irq+0x15/0x60 RSP: 0018:88085403fdf0 EFLAGS: 00010086 RAX: 0002 RBX: 0010 RCX: RDX: 60ef80008248 RSI: 0286 RDI: 6b6b6b6b6b6b6b6b RBP: 88085403fdf0 R08: 0286 R09: 9f39 R10: 00020001 R11: 00020001 R12: 88103c17a130 R13: 88103c17a080 R14: R15: FS: () GS:88107fca() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 006e5ab8 CR3: 0193d000 CR4: 000407e0 Stack: 88085403fe18 812cbfc2 88103c17a130 88103c17a130 88085403fec0 810d1d28 880854021de0 880854021de0 88107fcaec58 88085403fe80 88107fcaec30 Call Trace: [812cbfc2] __blkg_release_rcu+0x72/0x150 [810d1d28] rcu_nocb_kthread+0x1e8/0x300 [81091d81] kthread+0xe1/0x100 [8163813c] ret_from_fork+0x7c/0xb0 Code: ff 47 04 48 8b 7d 08 be 00 02 00 00 e8 55 48 a4 ff 5d c3 0f 1f 00 66 66 66 66 90 55 48 89 e5 +fa 66 66 90 66 66 90 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f +b7 RIP [8162e9e5] _raw_spin_lock_irq+0x15/0x60 RSP 88085403fdf0 The request_queue locking was added because blkcg_gq-refcnt is an int protected with the queue lock and __blkg_release_rcu() needs to put the parent. Let's fix it by making blkcg_gq-refcnt an atomic_t and dropping queue locking in the function. Given the general heavy weight of the current request_queue and blkcg operations, this is unlikely to cause any noticeable overhead. Moreover, blkcg_gq-refcnt is likely to be converted to percpu_ref in the near future, so whatever (most likely negligible) overhead it may add is temporary. Signed-off-by: Tejun Heo t...@kernel.org Reported-by: Joe Lawrence joe.lawre...@stratus.com Cc: Vivek Goyal vgo...@redhat.com Link: http://lkml.kernel.org/g/alpine.deb.2.02.1406081816540.17...@jlaw-desktop.mno.stratus.com Cc: sta...@vger.kernel.org --- block/blk-cgroup.c |7 ++- block/blk-cgroup.h | 17 +++-- 2 files changed, 9 insertions(+), 15 deletions(-) --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -80,7 +80,7 @@ static struct blkcg_gq *blkg_alloc(struc blkg-q = q; INIT_LIST_HEAD(blkg-q_node); blkg-blkcg = blkcg; - blkg-refcnt = 1; + atomic_set(blkg-refcnt, 1); /* root blkg uses @q-root_rl, init rl only for !root blkgs */ if (blkcg != blkcg_root) { @@ -399,11 +399,8 @@ void __blkg_release_rcu(struct rcu_head /* release the blkcg and parent blkg refs this blkg has been holding */ css_put(blkg-blkcg-css); - if (blkg-parent) { - spin_lock_irq(blkg-q-queue_lock); + if (blkg-parent) blkg_put(blkg-parent); - spin_unlock_irq(blkg-q-queue_lock); - } blkg_free(blkg); } --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -18,6 +18,7 @@ #include linux/seq_file.h #include linux/radix-tree.h #include linux/blkdev.h +#include