Re: [PATCH] nvme: fix doulbe irq free

2020-09-17 Thread Tong Zhang
On Thu, Sep 17, 2020 at 4:30 AM Christoph Hellwig  wrote:
>
> On Wed, Sep 16, 2020 at 11:37:00AM -0400, Tong Zhang wrote:
> > the irq might already been released before reset work can run
>
> If it is we have a problem with the state machine, as we shouldn't
> have two resets running at the same time.  Can you share your reproducer?
>

Hi Christoph,
Thanks for replying.

It is possible that pci_free_irq() can be called twice for the same IRQ,
the first call can be

[  124.413113] nvme nvme0: I/O 6 QID 0 timeout, disable controller
[  124.414317] Workqueue: kblockd blk_mq_timeout_work
[  124.414317] Call Trace:
[  124.414317]  dump_stack+0x7d/0xb0
[  124.414317]  nvme_suspend_queue.cold+0x11/0x58
[  124.414317]  nvme_dev_disable+0x116/0x7b0
[  124.414317]  nvme_timeout+0x309/0x320
[  124.414317]  ? nvme_shutdown+0x40/0x40
[  124.414317]  ? load_balance+0xe76/0x1450
[  124.414317]  ? deref_stack_reg+0x40/0x40
[  124.414317]  ? sched_clock_local+0x99/0xc0
[  124.414317]  blk_mq_check_expired+0x313/0x340
[  124.414317]  ? blk_mq_freeze_queue_wait+0x180/0x180
[  124.414317]  ? _find_next_bit.constprop.0+0x3e/0xf0
[  124.414317]  blk_mq_queue_tag_busy_iter+0x2e9/0x4a0
[  124.414317]  ? blk_mq_freeze_queue_wait+0x180/0x180
[  124.414317]  ? blk_mq_all_tag_iter+0x10/0x10
[  124.414317]  ? _raw_write_lock_irqsave+0xd0/0xd0
[  124.414317]  ? run_rebalance_domains+0x80/0x80
[  124.414317]  ? blk_mq_freeze_queue_wait+0x180/0x180
[  124.414317]  ? debug_object_deactivate.part.0+0x1c4/0x210
[  124.414317]  blk_mq_timeout_work+0xaa/0x1f0
[  124.414317]  ? __blk_mq_end_request+0x1f0/0x1f0
[  124.414317]  ? __schedule+0x581/0xc40
[  124.414317]  ? read_word_at_a_time+0xe/0x20
[  124.414317]  ? strscpy+0xbf/0x190
[  124.414317]  process_one_work+0x4ad/0x7e0
[  124.414317]  worker_thread+0x73/0x690
[  124.414317]  ? process_one_work+0x7e0/0x7e0
[  124.414317]  kthread+0x199/0x1f0
[  124.414317]  ? kthread_create_on_node+0xd0/0xd0
[  124.414317]  ret_from_fork+0x22/0x30

And the second call can be
[  149.763974] nvme nvme0: Failed to read smart log (error -4)
[  149.774307] Trying to free already-free IRQ 10
[  149.774385] Call Trace:
[  149.774385]  pci_free_irq+0x13/0x20
[  149.774385]  nvme_reset_work.cold+0x1be/0xa78
[  149.774385]  ? nvme_probe+0x8c0/0x8c0
[  149.774385]  ? check_preempt_curr+0x9d/0xf0
[  149.774385]  ? ttwu_do_wakeup.isra.0+0x176/0x190
[  149.774385]  ? try_to_wake_up+0x37c/0x900
[  149.774385]  ? migration_cpu_stop+0x1e0/0x1e0
[  149.774385]  ? __schedule+0x581/0xc40
[  149.774385]  ? read_word_at_a_time+0xe/0x20
[  149.774385]  ? strscpy+0xbf/0x190
[  149.774385]  process_one_work+0x4ad/0x7e0
[  149.774385]  worker_thread+0x73/0x690
[  149.774385]  ? process_one_work+0x7e0/0x7e0
[  149.774385]  kthread+0x199/0x1f0
[  149.774385]  ? kthread_create_on_node+0xd0/0xd0
[  149.774385]  ret_from_fork+0x22/0x30

and clearly something is not working as expected.


- Tong


Re: [PATCH] nvme: fix doulbe irq free

2020-09-17 Thread Keith Busch
On Thu, Sep 17, 2020 at 11:22:54AM -0400, Tong Zhang wrote:
> On Thu, Sep 17, 2020 at 4:30 AM Christoph Hellwig  wrote:
> >
> > On Wed, Sep 16, 2020 at 11:37:00AM -0400, Tong Zhang wrote:
> > > the irq might already been released before reset work can run
> >
> > If it is we have a problem with the state machine, as we shouldn't
> > have two resets running at the same time.  Can you share your reproducer?
> >
> 
> Hi Christoph,
> Thanks for replying.
> 
> It is possible that pci_free_irq() can be called twice for the same IRQ,
> the first call can be
> 
> [  124.413113] nvme nvme0: I/O 6 QID 0 timeout, disable controller
> [  124.414317] Workqueue: kblockd blk_mq_timeout_work
> [  124.414317] Call Trace:
> [  124.414317]  dump_stack+0x7d/0xb0
> [  124.414317]  nvme_suspend_queue.cold+0x11/0x58
> [  124.414317]  nvme_dev_disable+0x116/0x7b0
> [  124.414317]  nvme_timeout+0x309/0x320
> [  124.414317]  ? nvme_shutdown+0x40/0x40
> [  124.414317]  ? load_balance+0xe76/0x1450
> [  124.414317]  ? deref_stack_reg+0x40/0x40
> [  124.414317]  ? sched_clock_local+0x99/0xc0
> [  124.414317]  blk_mq_check_expired+0x313/0x340
> [  124.414317]  ? blk_mq_freeze_queue_wait+0x180/0x180
> [  124.414317]  ? _find_next_bit.constprop.0+0x3e/0xf0
> [  124.414317]  blk_mq_queue_tag_busy_iter+0x2e9/0x4a0
> [  124.414317]  ? blk_mq_freeze_queue_wait+0x180/0x180
> [  124.414317]  ? blk_mq_all_tag_iter+0x10/0x10
> [  124.414317]  ? _raw_write_lock_irqsave+0xd0/0xd0
> [  124.414317]  ? run_rebalance_domains+0x80/0x80
> [  124.414317]  ? blk_mq_freeze_queue_wait+0x180/0x180
> [  124.414317]  ? debug_object_deactivate.part.0+0x1c4/0x210
> [  124.414317]  blk_mq_timeout_work+0xaa/0x1f0
> [  124.414317]  ? __blk_mq_end_request+0x1f0/0x1f0
> [  124.414317]  ? __schedule+0x581/0xc40
> [  124.414317]  ? read_word_at_a_time+0xe/0x20
> [  124.414317]  ? strscpy+0xbf/0x190
> [  124.414317]  process_one_work+0x4ad/0x7e0
> [  124.414317]  worker_thread+0x73/0x690
> [  124.414317]  ? process_one_work+0x7e0/0x7e0
> [  124.414317]  kthread+0x199/0x1f0
> [  124.414317]  ? kthread_create_on_node+0xd0/0xd0
> [  124.414317]  ret_from_fork+0x22/0x30
> 
> And the second call can be
> [  149.763974] nvme nvme0: Failed to read smart log (error -4)
> [  149.774307] Trying to free already-free IRQ 10
> [  149.774385] Call Trace:
> [  149.774385]  pci_free_irq+0x13/0x20
> [  149.774385]  nvme_reset_work.cold+0x1be/0xa78
> [  149.774385]  ? nvme_probe+0x8c0/0x8c0
> [  149.774385]  ? check_preempt_curr+0x9d/0xf0
> [  149.774385]  ? ttwu_do_wakeup.isra.0+0x176/0x190
> [  149.774385]  ? try_to_wake_up+0x37c/0x900
> [  149.774385]  ? migration_cpu_stop+0x1e0/0x1e0
> [  149.774385]  ? __schedule+0x581/0xc40
> [  149.774385]  ? read_word_at_a_time+0xe/0x20
> [  149.774385]  ? strscpy+0xbf/0x190
> [  149.774385]  process_one_work+0x4ad/0x7e0
> [  149.774385]  worker_thread+0x73/0x690
> [  149.774385]  ? process_one_work+0x7e0/0x7e0
> [  149.774385]  kthread+0x199/0x1f0
> [  149.774385]  ? kthread_create_on_node+0xd0/0xd0
> [  149.774385]  ret_from_fork+0x22/0x30
> 
> and clearly something is not working as expected.

And the problem is that nvme_hwmon_init() doesn't return an error, so it
returns to continue initialization on a disabled controller. That's the
bug that needs to be fixed.


Re: [PATCH] nvme: fix doulbe irq free

2020-09-17 Thread Christoph Hellwig
On Wed, Sep 16, 2020 at 11:37:00AM -0400, Tong Zhang wrote:
> the irq might already been released before reset work can run

If it is we have a problem with the state machine, as we shouldn't
have two resets running at the same time.  Can you share your reproducer?



[PATCH] nvme: fix doulbe irq free

2020-09-16 Thread Tong Zhang
the irq might already been released before reset work can run

[   81.137630] [ cut here ]
[   81.137913] Trying to free already-free IRQ 11
[   81.138145] WARNING: CPU: 1 PID: 7 at kernel/irq/manage.c:1751 
free_irq+0x389/0x590
[   81.138525] Modules linked in:
[   81.138681] CPU: 1 PID: 7 Comm: kworker/u4:0 Not tainted 5.9.0-rc4+ #83
[   81.139579] Workqueue: nvme-reset-wq nvme_reset_work
[   81.139828] RIP: 0010:free_irq+0x389/0x590
[   81.140035] Code: 8b 65 50 e8 69 6d 1d 00 48 83 c4 38 4c 89 e0 5b 5d 41 5c 
41 5d 41 5e 41 5f c3 8b 74 24 24 48 c7 c7 20 1c 86 8a e8 d7 8e f6 ff <0f0e
[   81.140951] RSP: :88806c06fb18 EFLAGS: 00010082
[   81.141216] RAX:  RBX:  RCX: 
[   81.141569] RDX: 0001 RSI: 0004 RDI: ed100d80df55
[   81.141924] RBP: 88806cac88c0 R08: 0001 R09: ed100d80df29
[   81.142278] R10: 0003 R11: ed100d80df28 R12: 88806cac8800
[   81.142632] R13: 88806cac8870 R14: 88806cac8840 R15: 88806cac8960
[   81.142987] FS:  () GS:88806d10() 
knlGS:
[   81.143388] CS:  0010 DS:  ES:  CR0: 80050033
[   81.143673] CR2:  CR3: 26c0e000 CR4: 06e0
[   81.144028] DR0:  DR1:  DR2: 
[   81.144382] DR3:  DR6: fffe0ff0 DR7: 0400
[   81.144734] Call Trace:
[   81.144862]  ? nvme_start_ctrl+0x180/0x180
[   81.145072]  pci_free_irq+0x13/0x20
[   81.145248]  nvme_reset_work+0xcf2/0x1ec0
[   81.145451]  ? __update_load_avg_cfs_rq+0x1a3/0x530
[   81.145695]  ? __update_load_avg_cfs_rq+0x4b/0x530
[   81.145935]  ? nvme_remove+0x1e0/0x1e0
[   81.146124]  ? ttwu_queue_wakelist+0x12e/0x150
[   81.146348]  ? try_to_wake_up+0x37c/0x900
[   81.146550]  ? migration_cpu_stop+0x1e0/0x1e0
[   81.146769]  ? __schedule+0x581/0xc40
[   81.146954]  ? read_word_at_a_time+0xe/0x20
[   81.147164]  ? strscpy+0xbf/0x190
[   81.147333]  process_one_work+0x4ad/0x7e0
[   81.147535]  worker_thread+0x73/0x690
[   81.147720]  ? process_one_work+0x7e0/0x7e0
[   81.147930]  kthread+0x199/0x1f0
[   81.148094]  ? kthread_create_on_node+0xd0/0xd0
[   81.148321]  ret_from_fork+0x22/0x30
[   81.148502] ---[ end trace 374ff42ced5b661f ]---

Signed-off-by: Tong Zhang 
---
 drivers/nvme/host/pci.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 899d2f4d7ab6..c2b083dcfd17 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2086,6 +2086,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
unsigned int nr_io_queues;
unsigned long size;
int result;
+   int adminq_irq_enabled = test_bit(NVMEQ_ENABLED, >flags);
 
/*
 * Sample the module parameters once at reset time so that we have
@@ -2133,14 +2134,16 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
adminq->q_db = dev->dbs;
 
  retry:
-   /* Deregister the admin queue's interrupt */
-   pci_free_irq(pdev, 0, adminq);
+   if (adminq_irq_enabled) {
+   /* Deregister the admin queue's interrupt */
+   pci_free_irq(pdev, 0, adminq);
 
-   /*
-* If we enable msix early due to not intx, disable it again before
-* setting up the full range we need.
-*/
-   pci_free_irq_vectors(pdev);
+   /*
+* If we enable msix early due to not intx, disable it again 
before
+* setting up the full range we need.
+*/
+   pci_free_irq_vectors(pdev);
+   }
 
result = nvme_setup_irqs(dev, nr_io_queues);
if (result <= 0)
@@ -2160,6 +2163,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (result)
return result;
set_bit(NVMEQ_ENABLED, >flags);
+   adminq_irq_enabled = 1;
 
result = nvme_create_io_queues(dev);
if (result || dev->online_queues < 2)
-- 
2.25.1