Re: [PATCH] nvme: fix doulbe irq free
On Thu, Sep 17, 2020 at 4:30 AM Christoph Hellwig wrote: > > On Wed, Sep 16, 2020 at 11:37:00AM -0400, Tong Zhang wrote: > > the irq might already been released before reset work can run > > If it is we have a problem with the state machine, as we shouldn't > have two resets running at the same time. Can you share your reproducer? > Hi Christoph, Thanks for replying. It is possible that pci_free_irq() can be called twice for the same IRQ, the first call can be [ 124.413113] nvme nvme0: I/O 6 QID 0 timeout, disable controller [ 124.414317] Workqueue: kblockd blk_mq_timeout_work [ 124.414317] Call Trace: [ 124.414317] dump_stack+0x7d/0xb0 [ 124.414317] nvme_suspend_queue.cold+0x11/0x58 [ 124.414317] nvme_dev_disable+0x116/0x7b0 [ 124.414317] nvme_timeout+0x309/0x320 [ 124.414317] ? nvme_shutdown+0x40/0x40 [ 124.414317] ? load_balance+0xe76/0x1450 [ 124.414317] ? deref_stack_reg+0x40/0x40 [ 124.414317] ? sched_clock_local+0x99/0xc0 [ 124.414317] blk_mq_check_expired+0x313/0x340 [ 124.414317] ? blk_mq_freeze_queue_wait+0x180/0x180 [ 124.414317] ? _find_next_bit.constprop.0+0x3e/0xf0 [ 124.414317] blk_mq_queue_tag_busy_iter+0x2e9/0x4a0 [ 124.414317] ? blk_mq_freeze_queue_wait+0x180/0x180 [ 124.414317] ? blk_mq_all_tag_iter+0x10/0x10 [ 124.414317] ? _raw_write_lock_irqsave+0xd0/0xd0 [ 124.414317] ? run_rebalance_domains+0x80/0x80 [ 124.414317] ? blk_mq_freeze_queue_wait+0x180/0x180 [ 124.414317] ? debug_object_deactivate.part.0+0x1c4/0x210 [ 124.414317] blk_mq_timeout_work+0xaa/0x1f0 [ 124.414317] ? __blk_mq_end_request+0x1f0/0x1f0 [ 124.414317] ? __schedule+0x581/0xc40 [ 124.414317] ? read_word_at_a_time+0xe/0x20 [ 124.414317] ? strscpy+0xbf/0x190 [ 124.414317] process_one_work+0x4ad/0x7e0 [ 124.414317] worker_thread+0x73/0x690 [ 124.414317] ? process_one_work+0x7e0/0x7e0 [ 124.414317] kthread+0x199/0x1f0 [ 124.414317] ? kthread_create_on_node+0xd0/0xd0 [ 124.414317] ret_from_fork+0x22/0x30 And the second call can be [ 149.763974] nvme nvme0: Failed to read smart log (error -4) [ 149.774307] Trying to free already-free IRQ 10 [ 149.774385] Call Trace: [ 149.774385] pci_free_irq+0x13/0x20 [ 149.774385] nvme_reset_work.cold+0x1be/0xa78 [ 149.774385] ? nvme_probe+0x8c0/0x8c0 [ 149.774385] ? check_preempt_curr+0x9d/0xf0 [ 149.774385] ? ttwu_do_wakeup.isra.0+0x176/0x190 [ 149.774385] ? try_to_wake_up+0x37c/0x900 [ 149.774385] ? migration_cpu_stop+0x1e0/0x1e0 [ 149.774385] ? __schedule+0x581/0xc40 [ 149.774385] ? read_word_at_a_time+0xe/0x20 [ 149.774385] ? strscpy+0xbf/0x190 [ 149.774385] process_one_work+0x4ad/0x7e0 [ 149.774385] worker_thread+0x73/0x690 [ 149.774385] ? process_one_work+0x7e0/0x7e0 [ 149.774385] kthread+0x199/0x1f0 [ 149.774385] ? kthread_create_on_node+0xd0/0xd0 [ 149.774385] ret_from_fork+0x22/0x30 and clearly something is not working as expected. - Tong
Re: [PATCH] nvme: fix doulbe irq free
On Thu, Sep 17, 2020 at 11:22:54AM -0400, Tong Zhang wrote: > On Thu, Sep 17, 2020 at 4:30 AM Christoph Hellwig wrote: > > > > On Wed, Sep 16, 2020 at 11:37:00AM -0400, Tong Zhang wrote: > > > the irq might already been released before reset work can run > > > > If it is we have a problem with the state machine, as we shouldn't > > have two resets running at the same time. Can you share your reproducer? > > > > Hi Christoph, > Thanks for replying. > > It is possible that pci_free_irq() can be called twice for the same IRQ, > the first call can be > > [ 124.413113] nvme nvme0: I/O 6 QID 0 timeout, disable controller > [ 124.414317] Workqueue: kblockd blk_mq_timeout_work > [ 124.414317] Call Trace: > [ 124.414317] dump_stack+0x7d/0xb0 > [ 124.414317] nvme_suspend_queue.cold+0x11/0x58 > [ 124.414317] nvme_dev_disable+0x116/0x7b0 > [ 124.414317] nvme_timeout+0x309/0x320 > [ 124.414317] ? nvme_shutdown+0x40/0x40 > [ 124.414317] ? load_balance+0xe76/0x1450 > [ 124.414317] ? deref_stack_reg+0x40/0x40 > [ 124.414317] ? sched_clock_local+0x99/0xc0 > [ 124.414317] blk_mq_check_expired+0x313/0x340 > [ 124.414317] ? blk_mq_freeze_queue_wait+0x180/0x180 > [ 124.414317] ? _find_next_bit.constprop.0+0x3e/0xf0 > [ 124.414317] blk_mq_queue_tag_busy_iter+0x2e9/0x4a0 > [ 124.414317] ? blk_mq_freeze_queue_wait+0x180/0x180 > [ 124.414317] ? blk_mq_all_tag_iter+0x10/0x10 > [ 124.414317] ? _raw_write_lock_irqsave+0xd0/0xd0 > [ 124.414317] ? run_rebalance_domains+0x80/0x80 > [ 124.414317] ? blk_mq_freeze_queue_wait+0x180/0x180 > [ 124.414317] ? debug_object_deactivate.part.0+0x1c4/0x210 > [ 124.414317] blk_mq_timeout_work+0xaa/0x1f0 > [ 124.414317] ? __blk_mq_end_request+0x1f0/0x1f0 > [ 124.414317] ? __schedule+0x581/0xc40 > [ 124.414317] ? read_word_at_a_time+0xe/0x20 > [ 124.414317] ? strscpy+0xbf/0x190 > [ 124.414317] process_one_work+0x4ad/0x7e0 > [ 124.414317] worker_thread+0x73/0x690 > [ 124.414317] ? process_one_work+0x7e0/0x7e0 > [ 124.414317] kthread+0x199/0x1f0 > [ 124.414317] ? kthread_create_on_node+0xd0/0xd0 > [ 124.414317] ret_from_fork+0x22/0x30 > > And the second call can be > [ 149.763974] nvme nvme0: Failed to read smart log (error -4) > [ 149.774307] Trying to free already-free IRQ 10 > [ 149.774385] Call Trace: > [ 149.774385] pci_free_irq+0x13/0x20 > [ 149.774385] nvme_reset_work.cold+0x1be/0xa78 > [ 149.774385] ? nvme_probe+0x8c0/0x8c0 > [ 149.774385] ? check_preempt_curr+0x9d/0xf0 > [ 149.774385] ? ttwu_do_wakeup.isra.0+0x176/0x190 > [ 149.774385] ? try_to_wake_up+0x37c/0x900 > [ 149.774385] ? migration_cpu_stop+0x1e0/0x1e0 > [ 149.774385] ? __schedule+0x581/0xc40 > [ 149.774385] ? read_word_at_a_time+0xe/0x20 > [ 149.774385] ? strscpy+0xbf/0x190 > [ 149.774385] process_one_work+0x4ad/0x7e0 > [ 149.774385] worker_thread+0x73/0x690 > [ 149.774385] ? process_one_work+0x7e0/0x7e0 > [ 149.774385] kthread+0x199/0x1f0 > [ 149.774385] ? kthread_create_on_node+0xd0/0xd0 > [ 149.774385] ret_from_fork+0x22/0x30 > > and clearly something is not working as expected. And the problem is that nvme_hwmon_init() doesn't return an error, so it returns to continue initialization on a disabled controller. That's the bug that needs to be fixed.
Re: [PATCH] nvme: fix doulbe irq free
On Wed, Sep 16, 2020 at 11:37:00AM -0400, Tong Zhang wrote: > the irq might already been released before reset work can run If it is we have a problem with the state machine, as we shouldn't have two resets running at the same time. Can you share your reproducer?
[PATCH] nvme: fix doulbe irq free
the irq might already been released before reset work can run [ 81.137630] [ cut here ] [ 81.137913] Trying to free already-free IRQ 11 [ 81.138145] WARNING: CPU: 1 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x389/0x590 [ 81.138525] Modules linked in: [ 81.138681] CPU: 1 PID: 7 Comm: kworker/u4:0 Not tainted 5.9.0-rc4+ #83 [ 81.139579] Workqueue: nvme-reset-wq nvme_reset_work [ 81.139828] RIP: 0010:free_irq+0x389/0x590 [ 81.140035] Code: 8b 65 50 e8 69 6d 1d 00 48 83 c4 38 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b 74 24 24 48 c7 c7 20 1c 86 8a e8 d7 8e f6 ff <0f0e [ 81.140951] RSP: :88806c06fb18 EFLAGS: 00010082 [ 81.141216] RAX: RBX: RCX: [ 81.141569] RDX: 0001 RSI: 0004 RDI: ed100d80df55 [ 81.141924] RBP: 88806cac88c0 R08: 0001 R09: ed100d80df29 [ 81.142278] R10: 0003 R11: ed100d80df28 R12: 88806cac8800 [ 81.142632] R13: 88806cac8870 R14: 88806cac8840 R15: 88806cac8960 [ 81.142987] FS: () GS:88806d10() knlGS: [ 81.143388] CS: 0010 DS: ES: CR0: 80050033 [ 81.143673] CR2: CR3: 26c0e000 CR4: 06e0 [ 81.144028] DR0: DR1: DR2: [ 81.144382] DR3: DR6: fffe0ff0 DR7: 0400 [ 81.144734] Call Trace: [ 81.144862] ? nvme_start_ctrl+0x180/0x180 [ 81.145072] pci_free_irq+0x13/0x20 [ 81.145248] nvme_reset_work+0xcf2/0x1ec0 [ 81.145451] ? __update_load_avg_cfs_rq+0x1a3/0x530 [ 81.145695] ? __update_load_avg_cfs_rq+0x4b/0x530 [ 81.145935] ? nvme_remove+0x1e0/0x1e0 [ 81.146124] ? ttwu_queue_wakelist+0x12e/0x150 [ 81.146348] ? try_to_wake_up+0x37c/0x900 [ 81.146550] ? migration_cpu_stop+0x1e0/0x1e0 [ 81.146769] ? __schedule+0x581/0xc40 [ 81.146954] ? read_word_at_a_time+0xe/0x20 [ 81.147164] ? strscpy+0xbf/0x190 [ 81.147333] process_one_work+0x4ad/0x7e0 [ 81.147535] worker_thread+0x73/0x690 [ 81.147720] ? process_one_work+0x7e0/0x7e0 [ 81.147930] kthread+0x199/0x1f0 [ 81.148094] ? kthread_create_on_node+0xd0/0xd0 [ 81.148321] ret_from_fork+0x22/0x30 [ 81.148502] ---[ end trace 374ff42ced5b661f ]--- Signed-off-by: Tong Zhang --- drivers/nvme/host/pci.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 899d2f4d7ab6..c2b083dcfd17 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2086,6 +2086,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) unsigned int nr_io_queues; unsigned long size; int result; + int adminq_irq_enabled = test_bit(NVMEQ_ENABLED, >flags); /* * Sample the module parameters once at reset time so that we have @@ -2133,14 +2134,16 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) adminq->q_db = dev->dbs; retry: - /* Deregister the admin queue's interrupt */ - pci_free_irq(pdev, 0, adminq); + if (adminq_irq_enabled) { + /* Deregister the admin queue's interrupt */ + pci_free_irq(pdev, 0, adminq); - /* -* If we enable msix early due to not intx, disable it again before -* setting up the full range we need. -*/ - pci_free_irq_vectors(pdev); + /* +* If we enable msix early due to not intx, disable it again before +* setting up the full range we need. +*/ + pci_free_irq_vectors(pdev); + } result = nvme_setup_irqs(dev, nr_io_queues); if (result <= 0) @@ -2160,6 +2163,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) if (result) return result; set_bit(NVMEQ_ENABLED, >flags); + adminq_irq_enabled = 1; result = nvme_create_io_queues(dev); if (result || dev->online_queues < 2) -- 2.25.1