Re: [PATCH v2] nvme-pci: cancel nvme device request before disabling

2020-08-28 Thread Sagi Grimberg

Added to nvme-5.9-rc


Re: [PATCH v2] nvme-pci: cancel nvme device request before disabling

2020-08-28 Thread Keith Busch
On Fri, Aug 28, 2020 at 10:17:08AM -0400, Tong Zhang wrote:
> This patch addresses an irq free warning and null pointer dereference
> error problem when nvme devices got timeout error during initialization.
> This problem happens when nvme_timeout() function is called while
> nvme_reset_work() is still in execution. This patch fixed the problem by
> setting flag of the problematic request to NVME_REQ_CANCELLED before
> calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
> an error code and let nvme_submit_sync_cmd() fail gracefully.
> The following is console output.

Thanks, this looks good to me.

Reviewed-by: Keith Busch 


[PATCH v2] nvme-pci: cancel nvme device request before disabling

2020-08-28 Thread Tong Zhang
This patch addresses an irq free warning and null pointer dereference
error problem when nvme devices got timeout error during initialization.
This problem happens when nvme_timeout() function is called while
nvme_reset_work() is still in execution. This patch fixed the problem by
setting flag of the problematic request to NVME_REQ_CANCELLED before
calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
an error code and let nvme_submit_sync_cmd() fail gracefully.
The following is console output.

[   62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
[   62.488796] nvme nvme0: could not set timestamp (881)
[   62.494888] [ cut here ]
[   62.495142] Trying to free already-free IRQ 11
[   62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 
free_irq+0x1f7/0x370
[   62.495742] Modules linked in:
[   62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
[   62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.13.0-48-gd9c812dda519-p4
[   62.496772] Workqueue: nvme-reset-wq nvme_reset_work
[   62.497019] RIP: 0010:free_irq+0x1f7/0x370
[   62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 
5e 41 5f c3 44 89 f6 48 c70
[   62.498133] RSP: :a96800043d40 EFLAGS: 00010086
[   62.498391] RAX:  RBX: 9b87fc458400 RCX: 
[   62.498741] RDX: 0001 RSI: 0096 RDI: 9693d72c
[   62.499091] RBP: 9b87fd4c8f60 R08: a96800043bfd R09: 0163
[   62.499440] R10: a96800043bf8 R11: a96800043bfd R12: 9b87fd4c8e00
[   62.499790] R13: 9b87fd4c8ea4 R14: 000b R15: 9b87fd76b000
[   62.500140] FS:  () GS:9b87fdc0() 
knlGS:
[   62.500534] CS:  0010 DS:  ES:  CR0: 80050033
[   62.500816] CR2:  CR3: 3aa0a000 CR4: 06f0
[   62.501165] DR0:  DR1:  DR2: 
[   62.501515] DR3:  DR6: fffe0ff0 DR7: 0400
[   62.501864] Call Trace:
[   62.501993]  pci_free_irq+0x13/0x20
[   62.502167]  nvme_reset_work+0x5d0/0x12a0
[   62.502369]  ? update_load_avg+0x59/0x580
[   62.502569]  ? ttwu_queue_wakelist+0xa8/0xc0
[   62.502780]  ? try_to_wake_up+0x1a2/0x450
[   62.502979]  process_one_work+0x1d2/0x390
[   62.503179]  worker_thread+0x45/0x3b0
[   62.503361]  ? process_one_work+0x390/0x390
[   62.503568]  kthread+0xf9/0x130
[   62.503726]  ? kthread_park+0x80/0x80
[   62.503911]  ret_from_fork+0x22/0x30
[   62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
[  123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
[  123.914670] nvme nvme0: 1/0/0 default/read/poll queues
[  123.916310] BUG: kernel NULL pointer dereference, address: 
[  123.917469] #PF: supervisor write access in kernel mode
[  123.917725] #PF: error_code(0x0002) - not-present page
[  123.917976] PGD 0 P4D 0
[  123.918109] Oops: 0002 [#1] SMP PTI
[  123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: GW 
5.8.0+ #8
[  123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.13.0-48-gd9c812dda519-p4
[  123.919219] Workqueue: nvme-reset-wq nvme_reset_work
[  123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 
8b 47 68 89 ee 48 89 fb 8b4
[  123.920657] RSP: :a96800043d40 EFLAGS: 00010286
[  123.920912] RAX: 9b87fc4fee40 RBX: 9b87fc8cb008 RCX: 
[  123.921258] RDX:  RSI:  RDI: 9b87fc618000
[  123.921602] RBP:  R08: 9b87fdc2c4a0 R09: 9b87fc616000
[  123.921949] R10:  R11: 9b87fffd1500 R12: 
[  123.922295] R13:  R14: 9b87fc8cb200 R15: 9b87fc8cb000
[  123.922641] FS:  () GS:9b87fdc0() 
knlGS:
[  123.923032] CS:  0010 DS:  ES:  CR0: 80050033
[  123.923312] CR2:  CR3: 3aa0a000 CR4: 06f0
[  123.923660] DR0:  DR1:  DR2: 
[  123.924007] DR3:  DR6: fffe0ff0 DR7: 0400
[  123.924353] Call Trace:
[  123.924479]  blk_mq_alloc_tag_set+0x137/0x2a0
[  123.924694]  nvme_reset_work+0xed6/0x12a0
[  123.924898]  process_one_work+0x1d2/0x390
[  123.925099]  worker_thread+0x45/0x3b0
[  123.925280]  ? process_one_work+0x390/0x390
[  123.925486]  kthread+0xf9/0x130
[  123.925642]  ? kthread_park+0x80/0x80
[  123.925825]  ret_from_fork+0x22/0x30
[  123.926004] Modules linked in:
[  123.926158] CR2: 
[  123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
[  123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 
8b 47 68 89 ee 

[PATCH v2] nvme-pci: cancel nvme device request before disabling

2020-08-14 Thread Tong Zhang
This patch addresses an irq free warning and null pointer dereference
error problem when nvme devices got timeout error during initialization.
This problem happens when nvme_timeout() function is called while
nvme_reset_work() is still in execution. This patch fixed the problem by
setting flag of the problematic request to NVME_REQ_CANCELLED before
calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
an error code and let nvme_submit_sync_cmd() fail gracefully.
The following is console output.

[   62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
[   62.488796] nvme nvme0: could not set timestamp (881)
[   62.494888] [ cut here ]
[   62.495142] Trying to free already-free IRQ 11
[   62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 
free_irq+0x1f7/0x370
[   62.495742] Modules linked in:
[   62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
[   62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.13.0-48-gd9c812dda519-p4
[   62.496772] Workqueue: nvme-reset-wq nvme_reset_work
[   62.497019] RIP: 0010:free_irq+0x1f7/0x370
[   62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 
5e 41 5f c3 44 89 f6 48 c70
[   62.498133] RSP: :a96800043d40 EFLAGS: 00010086
[   62.498391] RAX:  RBX: 9b87fc458400 RCX: 
[   62.498741] RDX: 0001 RSI: 0096 RDI: 9693d72c
[   62.499091] RBP: 9b87fd4c8f60 R08: a96800043bfd R09: 0163
[   62.499440] R10: a96800043bf8 R11: a96800043bfd R12: 9b87fd4c8e00
[   62.499790] R13: 9b87fd4c8ea4 R14: 000b R15: 9b87fd76b000
[   62.500140] FS:  () GS:9b87fdc0() 
knlGS:
[   62.500534] CS:  0010 DS:  ES:  CR0: 80050033
[   62.500816] CR2:  CR3: 3aa0a000 CR4: 06f0
[   62.501165] DR0:  DR1:  DR2: 
[   62.501515] DR3:  DR6: fffe0ff0 DR7: 0400
[   62.501864] Call Trace:
[   62.501993]  pci_free_irq+0x13/0x20
[   62.502167]  nvme_reset_work+0x5d0/0x12a0
[   62.502369]  ? update_load_avg+0x59/0x580
[   62.502569]  ? ttwu_queue_wakelist+0xa8/0xc0
[   62.502780]  ? try_to_wake_up+0x1a2/0x450
[   62.502979]  process_one_work+0x1d2/0x390
[   62.503179]  worker_thread+0x45/0x3b0
[   62.503361]  ? process_one_work+0x390/0x390
[   62.503568]  kthread+0xf9/0x130
[   62.503726]  ? kthread_park+0x80/0x80
[   62.503911]  ret_from_fork+0x22/0x30
[   62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
[  123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
[  123.914670] nvme nvme0: 1/0/0 default/read/poll queues
[  123.916310] BUG: kernel NULL pointer dereference, address: 
[  123.917469] #PF: supervisor write access in kernel mode
[  123.917725] #PF: error_code(0x0002) - not-present page
[  123.917976] PGD 0 P4D 0
[  123.918109] Oops: 0002 [#1] SMP PTI
[  123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: GW 
5.8.0+ #8
[  123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.13.0-48-gd9c812dda519-p4
[  123.919219] Workqueue: nvme-reset-wq nvme_reset_work
[  123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 
8b 47 68 89 ee 48 89 fb 8b4
[  123.920657] RSP: :a96800043d40 EFLAGS: 00010286
[  123.920912] RAX: 9b87fc4fee40 RBX: 9b87fc8cb008 RCX: 
[  123.921258] RDX:  RSI:  RDI: 9b87fc618000
[  123.921602] RBP:  R08: 9b87fdc2c4a0 R09: 9b87fc616000
[  123.921949] R10:  R11: 9b87fffd1500 R12: 
[  123.922295] R13:  R14: 9b87fc8cb200 R15: 9b87fc8cb000
[  123.922641] FS:  () GS:9b87fdc0() 
knlGS:
[  123.923032] CS:  0010 DS:  ES:  CR0: 80050033
[  123.923312] CR2:  CR3: 3aa0a000 CR4: 06f0
[  123.923660] DR0:  DR1:  DR2: 
[  123.924007] DR3:  DR6: fffe0ff0 DR7: 0400
[  123.924353] Call Trace:
[  123.924479]  blk_mq_alloc_tag_set+0x137/0x2a0
[  123.924694]  nvme_reset_work+0xed6/0x12a0
[  123.924898]  process_one_work+0x1d2/0x390
[  123.925099]  worker_thread+0x45/0x3b0
[  123.925280]  ? process_one_work+0x390/0x390
[  123.925486]  kthread+0xf9/0x130
[  123.925642]  ? kthread_park+0x80/0x80
[  123.925825]  ret_from_fork+0x22/0x30
[  123.926004] Modules linked in:
[  123.926158] CR2: 
[  123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
[  123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 
8b 47 68 89 ee