Re: [PATCH v2] nvme-pci: cancel nvme device request before disabling
Added to nvme-5.9-rc
Re: [PATCH v2] nvme-pci: cancel nvme device request before disabling
On Fri, Aug 28, 2020 at 10:17:08AM -0400, Tong Zhang wrote: > This patch addresses an irq free warning and null pointer dereference > error problem when nvme devices got timeout error during initialization. > This problem happens when nvme_timeout() function is called while > nvme_reset_work() is still in execution. This patch fixed the problem by > setting flag of the problematic request to NVME_REQ_CANCELLED before > calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns > an error code and let nvme_submit_sync_cmd() fail gracefully. > The following is console output. Thanks, this looks good to me. Reviewed-by: Keith Busch
[PATCH v2] nvme-pci: cancel nvme device request before disabling
This patch addresses an irq free warning and null pointer dereference error problem when nvme devices got timeout error during initialization. This problem happens when nvme_timeout() function is called while nvme_reset_work() is still in execution. This patch fixed the problem by setting flag of the problematic request to NVME_REQ_CANCELLED before calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns an error code and let nvme_submit_sync_cmd() fail gracefully. The following is console output. [ 62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller [ 62.488796] nvme nvme0: could not set timestamp (881) [ 62.494888] [ cut here ] [ 62.495142] Trying to free already-free IRQ 11 [ 62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370 [ 62.495742] Modules linked in: [ 62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8 [ 62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4 [ 62.496772] Workqueue: nvme-reset-wq nvme_reset_work [ 62.497019] RIP: 0010:free_irq+0x1f7/0x370 [ 62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70 [ 62.498133] RSP: :a96800043d40 EFLAGS: 00010086 [ 62.498391] RAX: RBX: 9b87fc458400 RCX: [ 62.498741] RDX: 0001 RSI: 0096 RDI: 9693d72c [ 62.499091] RBP: 9b87fd4c8f60 R08: a96800043bfd R09: 0163 [ 62.499440] R10: a96800043bf8 R11: a96800043bfd R12: 9b87fd4c8e00 [ 62.499790] R13: 9b87fd4c8ea4 R14: 000b R15: 9b87fd76b000 [ 62.500140] FS: () GS:9b87fdc0() knlGS: [ 62.500534] CS: 0010 DS: ES: CR0: 80050033 [ 62.500816] CR2: CR3: 3aa0a000 CR4: 06f0 [ 62.501165] DR0: DR1: DR2: [ 62.501515] DR3: DR6: fffe0ff0 DR7: 0400 [ 62.501864] Call Trace: [ 62.501993] pci_free_irq+0x13/0x20 [ 62.502167] nvme_reset_work+0x5d0/0x12a0 [ 62.502369] ? update_load_avg+0x59/0x580 [ 62.502569] ? ttwu_queue_wakelist+0xa8/0xc0 [ 62.502780] ? try_to_wake_up+0x1a2/0x450 [ 62.502979] process_one_work+0x1d2/0x390 [ 62.503179] worker_thread+0x45/0x3b0 [ 62.503361] ? process_one_work+0x390/0x390 [ 62.503568] kthread+0xf9/0x130 [ 62.503726] ? kthread_park+0x80/0x80 [ 62.503911] ret_from_fork+0x22/0x30 [ 62.504090] ---[ end trace de9ed4a70f8d71e2 ]--- [ 123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller [ 123.914670] nvme nvme0: 1/0/0 default/read/poll queues [ 123.916310] BUG: kernel NULL pointer dereference, address: [ 123.917469] #PF: supervisor write access in kernel mode [ 123.917725] #PF: error_code(0x0002) - not-present page [ 123.917976] PGD 0 P4D 0 [ 123.918109] Oops: 0002 [#1] SMP PTI [ 123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: GW 5.8.0+ #8 [ 123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4 [ 123.919219] Workqueue: nvme-reset-wq nvme_reset_work [ 123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80 [ 123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4 [ 123.920657] RSP: :a96800043d40 EFLAGS: 00010286 [ 123.920912] RAX: 9b87fc4fee40 RBX: 9b87fc8cb008 RCX: [ 123.921258] RDX: RSI: RDI: 9b87fc618000 [ 123.921602] RBP: R08: 9b87fdc2c4a0 R09: 9b87fc616000 [ 123.921949] R10: R11: 9b87fffd1500 R12: [ 123.922295] R13: R14: 9b87fc8cb200 R15: 9b87fc8cb000 [ 123.922641] FS: () GS:9b87fdc0() knlGS: [ 123.923032] CS: 0010 DS: ES: CR0: 80050033 [ 123.923312] CR2: CR3: 3aa0a000 CR4: 06f0 [ 123.923660] DR0: DR1: DR2: [ 123.924007] DR3: DR6: fffe0ff0 DR7: 0400 [ 123.924353] Call Trace: [ 123.924479] blk_mq_alloc_tag_set+0x137/0x2a0 [ 123.924694] nvme_reset_work+0xed6/0x12a0 [ 123.924898] process_one_work+0x1d2/0x390 [ 123.925099] worker_thread+0x45/0x3b0 [ 123.925280] ? process_one_work+0x390/0x390 [ 123.925486] kthread+0xf9/0x130 [ 123.925642] ? kthread_park+0x80/0x80 [ 123.925825] ret_from_fork+0x22/0x30 [ 123.926004] Modules linked in: [ 123.926158] CR2: [ 123.926322] ---[ end trace de9ed4a70f8d71e3 ]--- [ 123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80 [ 123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee
[PATCH v2] nvme-pci: cancel nvme device request before disabling
This patch addresses an irq free warning and null pointer dereference error problem when nvme devices got timeout error during initialization. This problem happens when nvme_timeout() function is called while nvme_reset_work() is still in execution. This patch fixed the problem by setting flag of the problematic request to NVME_REQ_CANCELLED before calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns an error code and let nvme_submit_sync_cmd() fail gracefully. The following is console output. [ 62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller [ 62.488796] nvme nvme0: could not set timestamp (881) [ 62.494888] [ cut here ] [ 62.495142] Trying to free already-free IRQ 11 [ 62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370 [ 62.495742] Modules linked in: [ 62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8 [ 62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4 [ 62.496772] Workqueue: nvme-reset-wq nvme_reset_work [ 62.497019] RIP: 0010:free_irq+0x1f7/0x370 [ 62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70 [ 62.498133] RSP: :a96800043d40 EFLAGS: 00010086 [ 62.498391] RAX: RBX: 9b87fc458400 RCX: [ 62.498741] RDX: 0001 RSI: 0096 RDI: 9693d72c [ 62.499091] RBP: 9b87fd4c8f60 R08: a96800043bfd R09: 0163 [ 62.499440] R10: a96800043bf8 R11: a96800043bfd R12: 9b87fd4c8e00 [ 62.499790] R13: 9b87fd4c8ea4 R14: 000b R15: 9b87fd76b000 [ 62.500140] FS: () GS:9b87fdc0() knlGS: [ 62.500534] CS: 0010 DS: ES: CR0: 80050033 [ 62.500816] CR2: CR3: 3aa0a000 CR4: 06f0 [ 62.501165] DR0: DR1: DR2: [ 62.501515] DR3: DR6: fffe0ff0 DR7: 0400 [ 62.501864] Call Trace: [ 62.501993] pci_free_irq+0x13/0x20 [ 62.502167] nvme_reset_work+0x5d0/0x12a0 [ 62.502369] ? update_load_avg+0x59/0x580 [ 62.502569] ? ttwu_queue_wakelist+0xa8/0xc0 [ 62.502780] ? try_to_wake_up+0x1a2/0x450 [ 62.502979] process_one_work+0x1d2/0x390 [ 62.503179] worker_thread+0x45/0x3b0 [ 62.503361] ? process_one_work+0x390/0x390 [ 62.503568] kthread+0xf9/0x130 [ 62.503726] ? kthread_park+0x80/0x80 [ 62.503911] ret_from_fork+0x22/0x30 [ 62.504090] ---[ end trace de9ed4a70f8d71e2 ]--- [ 123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller [ 123.914670] nvme nvme0: 1/0/0 default/read/poll queues [ 123.916310] BUG: kernel NULL pointer dereference, address: [ 123.917469] #PF: supervisor write access in kernel mode [ 123.917725] #PF: error_code(0x0002) - not-present page [ 123.917976] PGD 0 P4D 0 [ 123.918109] Oops: 0002 [#1] SMP PTI [ 123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: GW 5.8.0+ #8 [ 123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4 [ 123.919219] Workqueue: nvme-reset-wq nvme_reset_work [ 123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80 [ 123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4 [ 123.920657] RSP: :a96800043d40 EFLAGS: 00010286 [ 123.920912] RAX: 9b87fc4fee40 RBX: 9b87fc8cb008 RCX: [ 123.921258] RDX: RSI: RDI: 9b87fc618000 [ 123.921602] RBP: R08: 9b87fdc2c4a0 R09: 9b87fc616000 [ 123.921949] R10: R11: 9b87fffd1500 R12: [ 123.922295] R13: R14: 9b87fc8cb200 R15: 9b87fc8cb000 [ 123.922641] FS: () GS:9b87fdc0() knlGS: [ 123.923032] CS: 0010 DS: ES: CR0: 80050033 [ 123.923312] CR2: CR3: 3aa0a000 CR4: 06f0 [ 123.923660] DR0: DR1: DR2: [ 123.924007] DR3: DR6: fffe0ff0 DR7: 0400 [ 123.924353] Call Trace: [ 123.924479] blk_mq_alloc_tag_set+0x137/0x2a0 [ 123.924694] nvme_reset_work+0xed6/0x12a0 [ 123.924898] process_one_work+0x1d2/0x390 [ 123.925099] worker_thread+0x45/0x3b0 [ 123.925280] ? process_one_work+0x390/0x390 [ 123.925486] kthread+0xf9/0x130 [ 123.925642] ? kthread_park+0x80/0x80 [ 123.925825] ret_from_fork+0x22/0x30 [ 123.926004] Modules linked in: [ 123.926158] CR2: [ 123.926322] ---[ end trace de9ed4a70f8d71e3 ]--- [ 123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80 [ 123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee