** Description changed:

+ [Impact]
+ 
+ At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk',
+ a snippet of the track is below, and full panic dump is attached. The
+ panic dump was collected via serial console, as the kernel panics so
+ early that we cannot kdump it.
+ 
+ [ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
+ [ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
+ [ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
+ [ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
+ 
+ [Test Case]
+ 
+ At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk',
+ a snippet of the track is below, and full panic dump is attached.
+ 
+ [Regression Potential]
+ 
+  * Fix implemented upstream starting with v4.6-rc1
+ 
+  * The fix is fairly straightfoward given the stack trace and the
+ upstream fix.
+ 
+  * The fix is hard to verify, but user "Proton" was able to confirmed
+ the test kernel including the fix solve this particular problem.
+ 
+ 
+ [Other Info]
+ 
+  * https://lkml.org/lkml/2016/3/16/40
+  * 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
+  * 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
+ 
+ [Original Description]
  We discovered a pretty serious regression introduced in 4.4.0-18.
  
  At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk',
  a snippet of the track is below, and full panic dump is attached. The
  panic dump was collected via serial console, as the kernel panics so
  early that we cannot kdump it.
  
  [    2.650512]  [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
  [    2.656675]  [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
  [    2.662661]  [<ffffffff813af53e>] add_disk+0x1ce/0x490
  [    2.667869]  [<ffffffff815477e0>] loop_add+0x1f0/0x270
  
  This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but
  the trace is not identical.
  
  We discovered this issue when we were experimenting with linux-generic-
  lts-xenial from trusty-updates on a 14.04 installation. When we
  installed it, 4.4.0-15 was the current package, and it worked fine and
  provided a large amount of improvements for us. Background security
  updates installed 4.4.0-18, and this updated and grub and became the
  default kernel. On a reboot, the node panics about 2 seconds in,
  resulting in a machine in a dead state. We were able to boot a rescue
  image and roll bac kto 4.4.0-15, which works nicely. We currently have
  pinning on 4.4.0-15 to prevent this problem from coming back, but would
  prefer to see the problem fixed.
  
  I'll attach lspci, lshw, and dmidecode for our hardware as well, but
  this is happening on pretty vanilla supermicro nodes. We are able to
  consistently reproduce it on our hardware. It is not reproducible in
  EC2, only on metal.

** Description changed:

  [Impact]
  
  At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk',
  a snippet of the track is below, and full panic dump is attached. The
  panic dump was collected via serial console, as the kernel panics so
  early that we cannot kdump it.
  
  [ 2.650512] [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
  [ 2.656675] [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
  [ 2.662661] [<ffffffff813af53e>] add_disk+0x1ce/0x490
  [ 2.667869] [<ffffffff815477e0>] loop_add+0x1f0/0x270
  
  [Test Case]
  
  At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk',
  a snippet of the track is below, and full panic dump is attached.
  
  [Regression Potential]
  
-  * Fix implemented upstream starting with v4.6-rc1
+  * Fix implemented upstream starting with v4.6-rc1
  
-  * The fix is fairly straightfoward given the stack trace and the
+  * The fix is fairly straightfoward given the stack trace and the
  upstream fix.
  
-  * The fix is hard to verify, but user "Proton" was able to confirmed
- the test kernel including the fix solve this particular problem.
- 
+  * The fix is hard to verify, but user "Proton" was able to confirmed the 
test kernel including the fix solve this particular problem: 
+ 
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/comments/23
  
  [Other Info]
  
-  * https://lkml.org/lkml/2016/3/16/40
-  * 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
-  * 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
+  * https://lkml.org/lkml/2016/3/16/40
+  * 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e0e827b9
+  * 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=897bb0c7
  
  [Original Description]
  We discovered a pretty serious regression introduced in 4.4.0-18.
  
  At boot-time, the kernel will panic somewhere in 'blk_mq_register_disk',
  a snippet of the track is below, and full panic dump is attached. The
  panic dump was collected via serial console, as the kernel panics so
  early that we cannot kdump it.
  
  [    2.650512]  [<ffffffff813ac8a6>] blk_mq_register_disk+0xa6/0x160
  [    2.656675]  [<ffffffff813a1b44>] blk_register_queue+0xb4/0x160
  [    2.662661]  [<ffffffff813af53e>] add_disk+0x1ce/0x490
  [    2.667869]  [<ffffffff815477e0>] loop_add+0x1f0/0x270
  
  This seems somewhat similar to https://lkml.org/lkml/2016/3/16/40, but
  the trace is not identical.
  
  We discovered this issue when we were experimenting with linux-generic-
  lts-xenial from trusty-updates on a 14.04 installation. When we
  installed it, 4.4.0-15 was the current package, and it worked fine and
  provided a large amount of improvements for us. Background security
  updates installed 4.4.0-18, and this updated and grub and became the
  default kernel. On a reboot, the node panics about 2 seconds in,
  resulting in a machine in a dead state. We were able to boot a rescue
  image and roll bac kto 4.4.0-15, which works nicely. We currently have
  pinning on 4.4.0-15 to prevent this problem from coming back, but would
  prefer to see the problem fixed.
  
  I'll attach lspci, lshw, and dmidecode for our hardware as well, but
  this is happening on pretty vanilla supermicro nodes. We are able to
  consistently reproduce it on our hardware. It is not reproducible in
  EC2, only on metal.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1572630

Title:
  boot-time kernel panic introduced in 4.4.0-18, not present in 4.4.0-15

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1572630/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to