https://lists.ubuntu.com/archives/kernel-team/2018-August/094654.html
** Description changed:
+
+ == SRU Justification ==
+ Mainline commit introduced a regression in v4.15-rc1. The regression
+ causes a kernel panic during system shutdown. This commit fixes
+ that regression. This commit was also cc'd to upstream stable, but it
+ has not landed in Bionic as of yet.
+
+ == Fix ==
+ 0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")
+
+ == Regression Potential ==
+ Low. This patch fixes a current regression. It has been cc'd to
+ upstream stable, so it has had additon upstream review.
+
+ == Test Case ==
+ A test kernel was built with this patch and tested by the original bug
reporter.
+ The bug reporter states the test kernel resolved the bug.
+
+
Verified on multiple DL360 Gen9 servers with up to date firmware. Just
before reboot or shutdown, there is the following panic:
[ 289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware
Error Source: 1
[ 289.093085] {1}[Hardware Error]: event severity: fatal
[ 289.093087] {1}[Hardware Error]: Error 0, type: fatal
[ 289.093088] {1}[Hardware Error]: section_type: PCIe error
[ 289.093090] {1}[Hardware Error]: port_type: 4, root port
[ 289.093091] {1}[Hardware Error]: version: 1.16
[ 289.093093] {1}[Hardware Error]: command: 0x6010, status: 0x0143
[ 289.093094] {1}[Hardware Error]: device_id: 0000:00:01.0
[ 289.093095] {1}[Hardware Error]: slot: 0
[ 289.093096] {1}[Hardware Error]: secondary_bus: 0x03
[ 289.093097] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02
[ 289.093098] {1}[Hardware Error]: class_code: 040600
[ 289.093378] {1}[Hardware Error]: bridge: secondary_status: 0x2000,
control: 0x0003
[ 289.093380] {1}[Hardware Error]: Error 1, type: fatal
[ 289.093381] {1}[Hardware Error]: section_type: PCIe error
[ 289.093382] {1}[Hardware Error]: port_type: 4, root port
[ 289.093383] {1}[Hardware Error]: version: 1.16
[ 289.093384] {1}[Hardware Error]: command: 0x6010, status: 0x0143
[ 289.093386] {1}[Hardware Error]: device_id: 0000:00:01.0
[ 289.093386] {1}[Hardware Error]: slot: 0
[ 289.093387] {1}[Hardware Error]: secondary_bus: 0x03
[ 289.093388] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02
[ 289.093674] {1}[Hardware Error]: class_code: 040600
[ 289.093676] {1}[Hardware Error]: bridge: secondary_status: 0x2000,
control: 0x0003
[ 289.093678] Kernel panic - not syncing: Fatal hardware error!
[ 289.093745] Kernel Offset: 0x1cc00000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)
[ 289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.
It does eventually restart after this. Then during the subsequent POST,
the following warning appears:
Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
Drive(s) - Operation Failed
- - 1719-Slot 0 Drive Array - A controller failure event occurred prior
- to this power-up. (Previous lock up code = 0x13) Action: Install the
- latest controller firmware. If the problem persists, replace the
- controller.
+ - 1719-Slot 0 Drive Array - A controller failure event occurred prior
+ to this power-up. (Previous lock up code = 0x13) Action: Install the
+ latest controller firmware. If the problem persists, replace the
+ controller.
The latter's symptoms are described in
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
but the running storage controller firmware is much newer than the doc's
resolution.
Neither of these problems occur during shutdown/reboot on the xenial
kernel.
FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56 (01/22/2018)),
the shutdown failure mode was a loop like so:
[529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
[529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.222884] Do you have a strange power saving mode enabled?
[529153.222884] Dazed and confused, but trying to continue
[529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554448] Do you have a strange power saving mode enabled?
[529153.554449] Dazed and confused, but trying to continue
[529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554451] Do you have a strange power saving mode enabled?
[529153.554452] Dazed and confused, but trying to continue
[529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554453] Do you have a strange power saving mode enabled?
[529153.554454] Dazed and confused, but trying to continue
[529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
[529153.554455] Do you have a strange power saving mode enabled?
[529153.554456] Dazed and confused, but trying to continue
[529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554458] Do you have a strange power saving mode enabled?
[529153.554458] Dazed and confused, but trying to continue
[529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554460] Do you have a strange power saving mode enabled?
[529153.554460] Dazed and confused, but trying to continue
[529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529154.953917] Do you have a strange power saving mode enabled?
[529154.953918] Dazed and confused, but trying to continue
But upgrading to 2.56 changes that to a kernel panic.
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-signed-image-generic 4.15.0.21.22
ProcVersionSignature: Ubuntu 4.15.0-21.22-generic 4.15.17
Uname: Linux 4.15.0-21-generic x86_64
AlsaDevices:
- total 0
- crw-rw---- 1 root audio 116, 1 May 15 23:11 seq
- crw-rw---- 1 root audio 116, 33 May 15 23:11 timer
+ total 0
+ crw-rw---- 1 root audio 116, 1 May 15 23:11 seq
+ crw-rw---- 1 root audio 116, 33 May 15 23:11 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord':
'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
Date: Wed May 16 00:17:53 2018
HibernationDevice: RESUME=UUID=696e8063-c668-4c89-a478-bfc23a450369
InstallationDate: Installed on 2016-06-01 (713 days ago)
InstallationMedia: Ubuntu-Server 14.04.5 LTS "Trusty Tahr" - Beta amd64
(20160527)
MachineType: HP ProLiant DL360 Gen9
PciMultimedia:
-
+
ProcEnviron:
- TERM=xterm-256color
- PATH=(custom, no user)
- LANG=en_US.UTF-8
- SHELL=/bin/bash
+ TERM=xterm-256color
+ PATH=(custom, no user)
+ LANG=en_US.UTF-8
+ SHELL=/bin/bash
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-21-generic
root=UUID=6e6d422d-8ffb-4db3-b8c7-6c81e320b1b2 ro console=tty0
console=ttyS1,38400 nosplash console=ttyS1,38400 console=tty0 nosplash
RelatedPackageVersions:
- linux-restricted-modules-4.15.0-21-generic N/A
- linux-backports-modules-4.15.0-21-generic N/A
- linux-firmware 1.173
+ linux-restricted-modules-4.15.0-21-generic N/A
+ linux-backports-modules-4.15.0-21-generic N/A
+ linux-firmware 1.173
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2018-05-09 (6 days ago)
dmi.bios.date: 01/22/2018
dmi.bios.vendor: HP
dmi.bios.version: P89
dmi.board.name: ProLiant DL360 Gen9
dmi.board.vendor: HP
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias:
dmi:bvnHP:bvrP89:bd01/22/2018:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:
dmi.product.family: ProLiant
dmi.product.name: ProLiant DL360 Gen9
dmi.sys.vendor: HP
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1771467
Title:
Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0
To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1771467/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs