[PATCH 4.2 012/134] blk-mq: fix race between timeout and freeing request

2015-09-26 Thread Greg Kroah-Hartman
4.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Ming Lei 

commit 0048b4837affd153897ed183492070027aa9 upstream.

Inside timeout handler, blk_mq_tag_to_rq() is called
to retrieve the request from one tag. This way is obviously
wrong because the request can be freed any time and some
fiedds of the request can't be trusted, then kernel oops
might be triggered[1].

Currently wrt. blk_mq_tag_to_rq(), the only special case is
that the flush request can share same tag with the request
cloned from, and the two requests can't be active at the same
time, so this patch fixes the above issue by updating tags->rqs[tag]
with the active request(either flush rq or the request cloned
from) of the tag.

Also blk_mq_tag_to_rq() gets much simplified with this patch.

Given blk_mq_tag_to_rq() is mainly for drivers and the caller must
make sure the request can't be freed, so in bt_for_each() this
helper is replaced with tags->rqs[tag].

[1] kernel oops log
[  439.696220] BUG: unable to handle kernel NULL pointer dereference at 
0158^M
[  439.697162] IP: [] blk_mq_tag_to_rq+0x21/0x6e^M
[  439.700653] PGD 7ef765067 PUD 7ef764067 PMD 0 ^M
[  439.700653] Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[  439.700653] Dumping ftrace buffer:^M
[  439.700653](ftrace buffer empty)^M
[  439.700653] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[  439.700653] CPU: 6 PID: 2779 Comm: stress-ng-sigfd Not tainted 
4.2.0-rc5-next-20150805+ #265^M
[  439.730500] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011^M
[  439.730500] task: 880605308000 ti: 88060530c000 task.ti: 
88060530c000^M
[  439.730500] RIP: 0010:[]  [] 
blk_mq_tag_to_rq+0x21/0x6e^M
[  439.730500] RSP: 0018:880819203da0  EFLAGS: 00010283^M
[  439.730500] RAX: 880811b0e000 RBX: 8800bb465f00 RCX: 
0002^M
[  439.730500] RDX:  RSI: 0202 RDI: 
^M
[  439.730500] RBP: 880819203db0 R08: 0002 R09: 
^M
[  439.730500] R10:  R11:  R12: 
0202^M
[  439.730500] R13: 880814104800 R14: 0002 R15: 
880811a2ea00^M
[  439.730500] FS:  7f165b3f5740() GS:88081920() 
knlGS:^M
[  439.730500] CS:  0010 DS:  ES:  CR0: 8005003b^M
[  439.730500] CR2: 0158 CR3: 0007ef766000 CR4: 
06e0^M
[  439.730500] Stack:^M
[  439.730500]  0008 8808114eed90 880819203e00 
812dc104^M
[  439.755663]  880819203e40 812d9f5e 0200 
8808114eed80^M
[  439.755663] Call Trace:^M
[  439.755663]   ^M
[  439.755663]  [] bt_for_each+0x6e/0xc8^M
[  439.755663]  [] ? blk_mq_rq_timed_out+0x6a/0x6a^M
[  439.755663]  [] ? blk_mq_rq_timed_out+0x6a/0x6a^M
[  439.755663]  [] blk_mq_tag_busy_iter+0x55/0x5e^M
[  439.755663]  [] ? blk_mq_bio_to_request+0x38/0x38^M
[  439.755663]  [] blk_mq_rq_timer+0x5d/0xd4^M
[  439.755663]  [] call_timer_fn+0xf7/0x284^M
[  439.755663]  [] ? call_timer_fn+0x5/0x284^M
[  439.755663]  [] ? blk_mq_bio_to_request+0x38/0x38^M
[  439.755663]  [] run_timer_softirq+0x1ce/0x1f8^M
[  439.755663]  [] __do_softirq+0x181/0x3a4^M
[  439.755663]  [] irq_exit+0x40/0x94^M
[  439.755663]  [] smp_apic_timer_interrupt+0x33/0x3e^M
[  439.755663]  [] apic_timer_interrupt+0x84/0x90^M
[  439.755663]   ^M
[  439.755663]  [] ? _raw_spin_unlock_irq+0x32/0x4a^M
[  439.755663]  [] finish_task_switch+0xe0/0x163^M
[  439.755663]  [] ? finish_task_switch+0xa2/0x163^M
[  439.755663]  [] __schedule+0x469/0x6cd^M
[  439.755663]  [] schedule+0x82/0x9a^M
[  439.789267]  [] signalfd_read+0x186/0x49a^M
[  439.790911]  [] ? wake_up_q+0x47/0x47^M
[  439.790911]  [] __vfs_read+0x28/0x9f^M
[  439.790911]  [] ? __fget_light+0x4d/0x74^M
[  439.790911]  [] vfs_read+0x7a/0xc6^M
[  439.790911]  [] SyS_read+0x49/0x7f^M
[  439.790911]  [] entry_SYSCALL_64_fastpath+0x12/0x6f^M
[  439.790911] Code: 48 89 e5 e8 a9 b8 e7 ff 5d c3 0f 1f 44 00 00 55 89
f2 48 89 e5 41 54 41 89 f4 53 48 8b 47 60 48 8b 1c d0 48 8b 7b 30 48 8b
53 38 <48> 8b 87 58 01 00 00 48 85 c0 75 09 48 8b 97 88 0c 00 00 eb 10
^M
[  439.790911] RIP  [] blk_mq_tag_to_rq+0x21/0x6e^M
[  439.790911]  RSP ^M
[  439.790911] CR2: 0158^M
[  439.790911] ---[ end trace d40af58949325661 ]---^M

Signed-off-by: Ming Lei 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman 

---
 block/blk-flush.c  |   15 ++-
 block/blk-mq-tag.c |4 ++--
 block/blk-mq-tag.h |   12 
 block/blk-mq.c |   16 +---
 block/blk.h|6 ++
 5 files changed, 35 insertions(+), 18 deletions(-)

--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -73,6 +73,7 @@
 
 #include "blk.h"
 #include "blk-mq.h"
+#include "blk-mq-tag.h"
 
 /* FLUSH/FUA sequences */
 enum {
@@ -226,7 +227,12 @@ static void flush_end_io(struct request
struct blk_flush_queue *fq 

[PATCH 4.2 012/134] blk-mq: fix race between timeout and freeing request

2015-09-26 Thread Greg Kroah-Hartman
4.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Ming Lei 

commit 0048b4837affd153897ed183492070027aa9 upstream.

Inside timeout handler, blk_mq_tag_to_rq() is called
to retrieve the request from one tag. This way is obviously
wrong because the request can be freed any time and some
fiedds of the request can't be trusted, then kernel oops
might be triggered[1].

Currently wrt. blk_mq_tag_to_rq(), the only special case is
that the flush request can share same tag with the request
cloned from, and the two requests can't be active at the same
time, so this patch fixes the above issue by updating tags->rqs[tag]
with the active request(either flush rq or the request cloned
from) of the tag.

Also blk_mq_tag_to_rq() gets much simplified with this patch.

Given blk_mq_tag_to_rq() is mainly for drivers and the caller must
make sure the request can't be freed, so in bt_for_each() this
helper is replaced with tags->rqs[tag].

[1] kernel oops log
[  439.696220] BUG: unable to handle kernel NULL pointer dereference at 
0158^M
[  439.697162] IP: [] blk_mq_tag_to_rq+0x21/0x6e^M
[  439.700653] PGD 7ef765067 PUD 7ef764067 PMD 0 ^M
[  439.700653] Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[  439.700653] Dumping ftrace buffer:^M
[  439.700653](ftrace buffer empty)^M
[  439.700653] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[  439.700653] CPU: 6 PID: 2779 Comm: stress-ng-sigfd Not tainted 
4.2.0-rc5-next-20150805+ #265^M
[  439.730500] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011^M
[  439.730500] task: 880605308000 ti: 88060530c000 task.ti: 
88060530c000^M
[  439.730500] RIP: 0010:[]  [] 
blk_mq_tag_to_rq+0x21/0x6e^M
[  439.730500] RSP: 0018:880819203da0  EFLAGS: 00010283^M
[  439.730500] RAX: 880811b0e000 RBX: 8800bb465f00 RCX: 
0002^M
[  439.730500] RDX:  RSI: 0202 RDI: 
^M
[  439.730500] RBP: 880819203db0 R08: 0002 R09: 
^M
[  439.730500] R10:  R11:  R12: 
0202^M
[  439.730500] R13: 880814104800 R14: 0002 R15: 
880811a2ea00^M
[  439.730500] FS:  7f165b3f5740() GS:88081920() 
knlGS:^M
[  439.730500] CS:  0010 DS:  ES:  CR0: 8005003b^M
[  439.730500] CR2: 0158 CR3: 0007ef766000 CR4: 
06e0^M
[  439.730500] Stack:^M
[  439.730500]  0008 8808114eed90 880819203e00 
812dc104^M
[  439.755663]  880819203e40 812d9f5e 0200 
8808114eed80^M
[  439.755663] Call Trace:^M
[  439.755663]   ^M
[  439.755663]  [] bt_for_each+0x6e/0xc8^M
[  439.755663]  [] ? blk_mq_rq_timed_out+0x6a/0x6a^M
[  439.755663]  [] ? blk_mq_rq_timed_out+0x6a/0x6a^M
[  439.755663]  [] blk_mq_tag_busy_iter+0x55/0x5e^M
[  439.755663]  [] ? blk_mq_bio_to_request+0x38/0x38^M
[  439.755663]  [] blk_mq_rq_timer+0x5d/0xd4^M
[  439.755663]  [] call_timer_fn+0xf7/0x284^M
[  439.755663]  [] ? call_timer_fn+0x5/0x284^M
[  439.755663]  [] ? blk_mq_bio_to_request+0x38/0x38^M
[  439.755663]  [] run_timer_softirq+0x1ce/0x1f8^M
[  439.755663]  [] __do_softirq+0x181/0x3a4^M
[  439.755663]  [] irq_exit+0x40/0x94^M
[  439.755663]  [] smp_apic_timer_interrupt+0x33/0x3e^M
[  439.755663]  [] apic_timer_interrupt+0x84/0x90^M
[  439.755663]   ^M
[  439.755663]  [] ? _raw_spin_unlock_irq+0x32/0x4a^M
[  439.755663]  [] finish_task_switch+0xe0/0x163^M
[  439.755663]  [] ? finish_task_switch+0xa2/0x163^M
[  439.755663]  [] __schedule+0x469/0x6cd^M
[  439.755663]  [] schedule+0x82/0x9a^M
[  439.789267]  [] signalfd_read+0x186/0x49a^M
[  439.790911]  [] ? wake_up_q+0x47/0x47^M
[  439.790911]  [] __vfs_read+0x28/0x9f^M
[  439.790911]  [] ? __fget_light+0x4d/0x74^M
[  439.790911]  [] vfs_read+0x7a/0xc6^M
[  439.790911]  [] SyS_read+0x49/0x7f^M
[  439.790911]  [] entry_SYSCALL_64_fastpath+0x12/0x6f^M
[  439.790911] Code: 48 89 e5 e8 a9 b8 e7 ff 5d c3 0f 1f 44 00 00 55 89
f2 48 89 e5 41 54 41 89 f4 53 48 8b 47 60 48 8b 1c d0 48 8b 7b 30 48 8b
53 38 <48> 8b 87 58 01 00 00 48 85 c0 75 09 48 8b 97 88 0c 00 00 eb 10
^M
[  439.790911] RIP  [] blk_mq_tag_to_rq+0x21/0x6e^M
[  439.790911]  RSP ^M
[  439.790911] CR2: 0158^M
[  439.790911] ---[ end trace d40af58949325661 ]---^M

Signed-off-by: Ming Lei 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman 

---
 block/blk-flush.c  |   15 ++-
 block/blk-mq-tag.c |4 ++--
 block/blk-mq-tag.h |   12 
 block/blk-mq.c |   16 +---
 block/blk.h|6 ++
 5 files changed, 35 insertions(+), 18 deletions(-)

--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -73,6 +73,7 @@
 
 #include "blk.h"
 #include "blk-mq.h"
+#include "blk-mq-tag.h"
 
 /* FLUSH/FUA sequences */
 enum {
@@