SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2018-September/095569.html

** Description changed:

- A bug was introduced when backporting the fix for
- http://bugs.launchpad.net/bugs/1597908. This bug exists in all Ubuntu
- 16.04 LTS 4.4 kernels >= 4.4.0-36, and many other non-LTS kernels.
+ == SRU Justification ==
+ The following commit was applied to Xenial and introduced this
+ regression:
+ 287922eb0b18 ("block: defer timeouts to a workqueue")
+ 
+ This regression was introduced in mainline as of v4.5-rc1.  Bionc was
+ also affected by this regression, but it already go the fix when commit
+ 4e9b6f20828a was applied to mainline in v4.15-rc1.
+ 
+ The regression caused a kernel hang because the HPSA driver has a tendency
+ to aggressively remove missing devices.
+ 
+ == Fix ==
+ 4e9b6f20828a ("block: Fix a race between blk_cleanup_queue() and timeout 
handling")
+ 
+ == Regression Potential ==
+ Low.  This commit fixes a regression and has been cc'd to stable, so it
+ has had addition upstream review.  This commit is already applied to
+ Bionic and Cosmic.
+ 
+ == Test Case ==
+ A test kernel was built with this patch and tested by the original bug 
reporter.
+ The bug reporter states the test kernel resolved the bug.
+ 
+ 
+ 
+ 
+ 
+ A bug was introduced when backporting the fix for 
http://bugs.launchpad.net/bugs/1597908. This bug exists in all Ubuntu 16.04 LTS 
4.4 kernels >= 4.4.0-36, and many other non-LTS kernels.
  
  This patch changes the context in which timeout work is scheduled for
  block devices in the kernel. Previously, timeout work was executed
  directly from the timer callback that fired when a deadline was met.
  After the patch, timeout work is scheduled using a background work
  queue. This means that by the time the work executes, the device queue
  which originally scheduled the work could be torn down. In order to
  prevent this, the patch takes a reference on the device queue when
  executing the timeout work.
  
  The problem is that the last reference to this queue can be removed
  before the timeout work can be executed. During teardown, the block
  system executes a freeze followed by a drain. The freeze drops the last
  reference on the queue. The drain tries to clean up any outstanding
  work, including timeout work. After a freeze, the timeout work in the
  background queue is unable to obtain a reference, and exits early
  without completing work. The work is now permanently stuck in the queue
  and it will never be completed. The drain in the device teardown path
  spins indefinitely.
  
  The bug manifests as a hang that looks like this:
  [<ffffffff81829f15>] schedule+0x35/0x80
  [<ffffffffc014aea9>] hpsa_scan_start+0x109/0x140 [hpsa]
  [<ffffffff810c3cb0>] ? wake_atomic_t_function+0x60/0x60
  [<ffffffffc014b602>] hpsa_rescan_ctlr_worker+0x1d2/0x652 [hpsa]
  [<ffffffff8109a2c5>] process_one_work+0x165/0x480
  [<ffffffff8109a62b>] worker_thread+0x4b/0x4c0
  [<ffffffff8109a5e0>] ? process_one_work+0x480/0x480
  [<ffffffff810a0808>] kthread+0xd8/0xf0
  [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0
  [<ffffffff8182e38f>] ret_from_fork+0x3f/0x70
  [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0
  
  The fix exists upstream. It applies, builds, and runs cleanly on Ubuntu's 
most recent 4.4 kernel.
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=4e9b6f20828ac880dbc1fa2fdbafae779473d1af
  
  We hit this bug nearly 100% of the time on some of our HP hardware. The
  HPSA driver has a tendency to aggressively remove missing devices, so it
  widens the race. As a result, we've been building our own kernel with
  this patch applied. It would be really nice if we could get it into
  mainline Ubuntu.
  
  Let me know what additional information is needed. Thanks!

** Tags added: xenial

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1791790

Title:
  Kernel hang on drive pull caused by regression introduced by commit
  287922eb0b18

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791790/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to