Public bug reported:

[NOTE]

 * Patches will be sent to the kernel-team mailing list
   once the test kernel has been verified by the reporter.

[Impact]

 * Users may experience cpu hard lockups when performing
   rigorous writes to NVMe drives.

 * The fix addresses an scheduling issue in the original
   implementation of wbt/writeback throttling

 * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock
   contention and thundering herd issue in wbt_wait"),
   plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve
   waking of tasks").

 * There are additional commits to help with a cleaner
   backport and future maintenance:
   - Cosmic: 8 clean cherry picks.
   - Bionic: So, of 13 commits, 9 are clean cherry-picks
     and 4 backports, which are just changes to context
     lines (i.e. refresh) without any functional changes
     in the backport itself.

[Test Case]

 * This command has been reported to reproduce the problem:

   $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t
128

 * It generates stack traces as included below.

[Regression Potential]

 * The commits have been verified for fixes in later commits in
   linux-next as of 2019-01-08 and all known fix commits are in.

 * The regression potential is mostly contained in the writeback
   throttling (block/blk-wbt.*), as almost all of the 13 patches
   change exclusively that, except for 4 of them (2 of which are
   sysfs):

   - blk-rq-qos: refactor out common elements of blk-wbt (block/)
   - block: Protect less code with sysfs_lock in blk_{un,}register_queue() 
(blk-sysfs.c)
   - block: Protect less code with sysfs_lock in blk_{un,}register_queue() 
(blk-{mq-}sysfs.c)
   - block: pass struct request instead of struct blk_issue_stat to wbt 
(block/, still mostly blk-wbt.*)

[Other Info]

 * Alternatively, it is probably possible to introduce just the
   two commits that fix this with some changes to their code in
   the backport, but since the 'blk-rq-qos: refactor ..' commit
   may become a dependency for many additional/future fixes, it
   seemed interesting to pull it in earlier in the 18.04 branch.

 * The problem has been introduced with the blk-wbt mechanism,
   in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,
   so only Bionic and Cosmic needs this.

[Stack Traces]

[ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30
...
[ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic 
#21-Ubuntu
...
[ 393.628720] Call Trace:
[ 393.628721] <IRQ>
[ 393.628724] enqueue_task_fair+0x6c/0x7f0
[ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150
[ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150
[ 393.628731] activate_task+0x57/0xc0
[ 393.628735] ? sched_clock+0x9/0x10
[ 393.628736] ? sched_clock+0x9/0x10
[ 393.628738] ttwu_do_activate+0x49/0x90
[ 393.628739] try_to_wake_up+0x1df/0x490
[ 393.628741] default_wake_function+0x12/0x20
[ 393.628743] autoremove_wake_function+0x12/0x40
[ 393.628744] __wake_up_common+0x73/0x130
[ 393.628745] __wake_up_common_lock+0x80/0xc0
[ 393.628746] __wake_up+0x13/0x20
[ 393.628749] __wbt_done.part.21+0xa4/0xb0
[ 393.628749] wbt_done+0x72/0xa0
[ 393.628753] blk_mq_free_request+0xca/0x1a0
[ 393.628755] blk_mq_end_request+0x48/0x90
[ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core]
[ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme]
[ 393.628764] __blk_mq_complete_request+0xd2/0x140
[ 393.628766] blk_mq_complete_request+0x18/0x20
[ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme]
[ 393.628768] nvme_irq+0x23/0x50 [nvme]
[ 393.628772] __handle_irq_event_percpu+0x44/0x1a0
[ 393.628773] handle_irq_event_percpu+0x32/0x80
[ 393.628774] handle_irq_event+0x3b/0x60
[ 393.628778] handle_edge_irq+0x7c/0x190
[ 393.628779] handle_irq+0x20/0x30
[ 393.628783] do_IRQ+0x46/0xd0
[ 393.628784] common_interrupt+0x84/0x84
[ 393.628785] </IRQ>
...
[ 393.628794] ? cpuidle_enter_state+0x97/0x2f0
[ 393.628796] cpuidle_enter+0x17/0x20
[ 393.628797] call_cpuidle+0x23/0x40
[ 393.628798] do_idle+0x18c/0x1f0
[ 393.628799] cpu_startup_entry+0x73/0x80
[ 393.628802] start_secondary+0x1a6/0x200
[ 393.628804] secondary_startup_64+0xa5/0xb0
[ 393.628805] Code: ...

[ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled

[ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485
[ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818)
[ 435.617025] Sending NMI from CPU 8 to CPUs 30:
[ 435.617029] NMI backtrace for cpu 30
[ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic 
#21-Ubuntu
...
[ 435.617047] Call Trace:
[ 435.617048] <IRQ>
[ 435.617051] enqueue_entity+0x9f/0x6b0
[ 435.617053] enqueue_task_fair+0x6c/0x7f0
[ 435.617056] activate_task+0x57/0xc0
[ 435.617059] ? sched_clock+0x9/0x10
[ 435.617060] ? sched_clock+0x9/0x10
[ 435.617061] ttwu_do_activate+0x49/0x90
[ 435.617063] try_to_wake_up+0x1df/0x490
[ 435.617065] default_wake_function+0x12/0x20
[ 435.617067] autoremove_wake_function+0x12/0x40
[ 435.617068] __wake_up_common+0x73/0x130
[ 435.617069] __wake_up_common_lock+0x80/0xc0
[ 435.617070] __wake_up+0x13/0x20
[ 435.617073] __wbt_done.part.21+0xa4/0xb0
[ 435.617074] wbt_done+0x72/0xa0
[ 435.617077] blk_mq_free_request+0xca/0x1a0
[ 435.617079] blk_mq_end_request+0x48/0x90
[ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core]
[ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme]
[ 435.617088] __blk_mq_complete_request+0xd2/0x140
[ 435.617090] blk_mq_complete_request+0x18/0x20
[ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme]
[ 435.617093] nvme_irq+0x23/0x50 [nvme]
[ 435.617096] __handle_irq_event_percpu+0x44/0x1a0
[ 435.617097] handle_irq_event_percpu+0x32/0x80
[ 435.617098] handle_irq_event+0x3b/0x60
[ 435.617101] handle_edge_irq+0x7c/0x190
[ 435.617102] handle_irq+0x20/0x30
[ 435.617106] do_IRQ+0x46/0xd0
[ 435.617107] common_interrupt+0x84/0x84
[ 435.617108] </IRQ>
...
[ 435.617117] ? cpuidle_enter_state+0x97/0x2f0
[ 435.617118] cpuidle_enter+0x17/0x20
[ 435.617119] call_cpuidle+0x23/0x40
[ 435.617121] do_idle+0x18c/0x1f0
[ 435.617122] cpu_startup_entry+0x73/0x80
[ 435.617125] start_secondary+0x1a6/0x200
[ 435.617127] secondary_startup_64+0xa5/0xb0
[ 435.617128] Code: ...

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Mauricio Faria de Oliveira (mfo)
         Status: Incomplete


** Tags: bionic

** Changed in: linux (Ubuntu)
     Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810998

Title:
  CPU hard lockup with rigorous writes to NVMe drive

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [NOTE]

   * Patches will be sent to the kernel-team mailing list
     once the test kernel has been verified by the reporter.

  [Impact]

   * Users may experience cpu hard lockups when performing
     rigorous writes to NVMe drives.

   * The fix addresses an scheduling issue in the original
     implementation of wbt/writeback throttling

   * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock
     contention and thundering herd issue in wbt_wait"),
     plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve
     waking of tasks").

   * There are additional commits to help with a cleaner
     backport and future maintenance:
     - Cosmic: 8 clean cherry picks.
     - Bionic: So, of 13 commits, 9 are clean cherry-picks
       and 4 backports, which are just changes to context
       lines (i.e. refresh) without any functional changes
       in the backport itself.

  [Test Case]

   * This command has been reported to reproduce the problem:

     $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t
  128

   * It generates stack traces as included below.

  [Regression Potential]

   * The commits have been verified for fixes in later commits in
     linux-next as of 2019-01-08 and all known fix commits are in.

   * The regression potential is mostly contained in the writeback
     throttling (block/blk-wbt.*), as almost all of the 13 patches
     change exclusively that, except for 4 of them (2 of which are
     sysfs):

     - blk-rq-qos: refactor out common elements of blk-wbt (block/)
     - block: Protect less code with sysfs_lock in blk_{un,}register_queue() 
(blk-sysfs.c)
     - block: Protect less code with sysfs_lock in blk_{un,}register_queue() 
(blk-{mq-}sysfs.c)
     - block: pass struct request instead of struct blk_issue_stat to wbt 
(block/, still mostly blk-wbt.*)

  [Other Info]

   * Alternatively, it is probably possible to introduce just the
     two commits that fix this with some changes to their code in
     the backport, but since the 'blk-rq-qos: refactor ..' commit
     may become a dependency for many additional/future fixes, it
     seemed interesting to pull it in earlier in the 18.04 branch.

   * The problem has been introduced with the blk-wbt mechanism,
     in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,
     so only Bionic and Cosmic needs this.

  [Stack Traces]

  [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30
  ...
  [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic 
#21-Ubuntu
  ...
  [ 393.628720] Call Trace:
  [ 393.628721] <IRQ>
  [ 393.628724] enqueue_task_fair+0x6c/0x7f0
  [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150
  [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150
  [ 393.628731] activate_task+0x57/0xc0
  [ 393.628735] ? sched_clock+0x9/0x10
  [ 393.628736] ? sched_clock+0x9/0x10
  [ 393.628738] ttwu_do_activate+0x49/0x90
  [ 393.628739] try_to_wake_up+0x1df/0x490
  [ 393.628741] default_wake_function+0x12/0x20
  [ 393.628743] autoremove_wake_function+0x12/0x40
  [ 393.628744] __wake_up_common+0x73/0x130
  [ 393.628745] __wake_up_common_lock+0x80/0xc0
  [ 393.628746] __wake_up+0x13/0x20
  [ 393.628749] __wbt_done.part.21+0xa4/0xb0
  [ 393.628749] wbt_done+0x72/0xa0
  [ 393.628753] blk_mq_free_request+0xca/0x1a0
  [ 393.628755] blk_mq_end_request+0x48/0x90
  [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core]
  [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme]
  [ 393.628764] __blk_mq_complete_request+0xd2/0x140
  [ 393.628766] blk_mq_complete_request+0x18/0x20
  [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme]
  [ 393.628768] nvme_irq+0x23/0x50 [nvme]
  [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0
  [ 393.628773] handle_irq_event_percpu+0x32/0x80
  [ 393.628774] handle_irq_event+0x3b/0x60
  [ 393.628778] handle_edge_irq+0x7c/0x190
  [ 393.628779] handle_irq+0x20/0x30
  [ 393.628783] do_IRQ+0x46/0xd0
  [ 393.628784] common_interrupt+0x84/0x84
  [ 393.628785] </IRQ>
  ...
  [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0
  [ 393.628796] cpuidle_enter+0x17/0x20
  [ 393.628797] call_cpuidle+0x23/0x40
  [ 393.628798] do_idle+0x18c/0x1f0
  [ 393.628799] cpu_startup_entry+0x73/0x80
  [ 393.628802] start_secondary+0x1a6/0x200
  [ 393.628804] secondary_startup_64+0xa5/0xb0
  [ 393.628805] Code: ...

  [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled

  [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks:
  [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485
  [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818)
  [ 435.617025] Sending NMI from CPU 8 to CPUs 30:
  [ 435.617029] NMI backtrace for cpu 30
  [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic 
#21-Ubuntu
  ...
  [ 435.617047] Call Trace:
  [ 435.617048] <IRQ>
  [ 435.617051] enqueue_entity+0x9f/0x6b0
  [ 435.617053] enqueue_task_fair+0x6c/0x7f0
  [ 435.617056] activate_task+0x57/0xc0
  [ 435.617059] ? sched_clock+0x9/0x10
  [ 435.617060] ? sched_clock+0x9/0x10
  [ 435.617061] ttwu_do_activate+0x49/0x90
  [ 435.617063] try_to_wake_up+0x1df/0x490
  [ 435.617065] default_wake_function+0x12/0x20
  [ 435.617067] autoremove_wake_function+0x12/0x40
  [ 435.617068] __wake_up_common+0x73/0x130
  [ 435.617069] __wake_up_common_lock+0x80/0xc0
  [ 435.617070] __wake_up+0x13/0x20
  [ 435.617073] __wbt_done.part.21+0xa4/0xb0
  [ 435.617074] wbt_done+0x72/0xa0
  [ 435.617077] blk_mq_free_request+0xca/0x1a0
  [ 435.617079] blk_mq_end_request+0x48/0x90
  [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core]
  [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme]
  [ 435.617088] __blk_mq_complete_request+0xd2/0x140
  [ 435.617090] blk_mq_complete_request+0x18/0x20
  [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme]
  [ 435.617093] nvme_irq+0x23/0x50 [nvme]
  [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0
  [ 435.617097] handle_irq_event_percpu+0x32/0x80
  [ 435.617098] handle_irq_event+0x3b/0x60
  [ 435.617101] handle_edge_irq+0x7c/0x190
  [ 435.617102] handle_irq+0x20/0x30
  [ 435.617106] do_IRQ+0x46/0xd0
  [ 435.617107] common_interrupt+0x84/0x84
  [ 435.617108] </IRQ>
  ...
  [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0
  [ 435.617118] cpuidle_enter+0x17/0x20
  [ 435.617119] call_cpuidle+0x23/0x40
  [ 435.617121] do_idle+0x18c/0x1f0
  [ 435.617122] cpu_startup_entry+0x73/0x80
  [ 435.617125] start_secondary+0x1a6/0x200
  [ 435.617127] secondary_startup_64+0xa5/0xb0
  [ 435.617128] Code: ...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810998/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to