[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0
We download the fix from http://kernel.ubuntu.com/~jsalisbury/lp1771467 and testet it. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1771467 Title: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0 Status in Linux: Fix Released Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: Verified on multiple DL360 Gen9 servers with up to date firmware. Just before reboot or shutdown, there is the following panic: [ 289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [ 289.093085] {1}[Hardware Error]: event severity: fatal [ 289.093087] {1}[Hardware Error]: Error 0, type: fatal [ 289.093088] {1}[Hardware Error]: section_type: PCIe error [ 289.093090] {1}[Hardware Error]: port_type: 4, root port [ 289.093091] {1}[Hardware Error]: version: 1.16 [ 289.093093] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093094] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093095] {1}[Hardware Error]: slot: 0 [ 289.093096] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093097] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093098] {1}[Hardware Error]: class_code: 040600 [ 289.093378] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093380] {1}[Hardware Error]: Error 1, type: fatal [ 289.093381] {1}[Hardware Error]: section_type: PCIe error [ 289.093382] {1}[Hardware Error]: port_type: 4, root port [ 289.093383] {1}[Hardware Error]: version: 1.16 [ 289.093384] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093386] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093386] {1}[Hardware Error]: slot: 0 [ 289.093387] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093388] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093674] {1}[Hardware Error]: class_code: 040600 [ 289.093676] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093678] Kernel panic - not syncing: Fatal hardware error! [ 289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation range: 0x8000-0xbfff) [ 289.105835] ERST: [Firmware Warn]: Firmware does not respond in time. It does eventually restart after this. Then during the subsequent POST, the following warning appears: Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical Drive(s) - Operation Failed - 1719-Slot 0 Drive Array - A controller failure event occurred prior to this power-up. (Previous lock up code = 0x13) Action: Install the latest controller firmware. If the problem persists, replace the controller. The latter's symptoms are described in https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565 but the running storage controller firmware is much newer than the doc's resolution. Neither of these problems occur during shutdown/reboot on the xenial kernel. FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56 (01/22/2018)), the shutdown failure mode was a loop like so: [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0. [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.222884] Do you have a strange power saving mode enabled? [529153.222884] Dazed and confused, but trying to continue [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554448] Do you have a strange power saving mode enabled? [529153.554449] Dazed and confused, but trying to continue [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554451] Do you have a strange power saving mode enabled? [529153.554452] Dazed and confused, but trying to continue [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554453] Do you have a strange power saving mode enabled? [529153.554454] Dazed and confused, but trying to continue [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0. [529153.554455] Do you have a strange power saving mode enabled? [529153.554456] Dazed and confused, but trying to continue [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554458] Do you have a strange power saving mode enabled? [529153.554458] Dazed and confused, but trying to continue [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554460] Do you have a strange power saving mode enabled? [529153.554460] Dazed and confused, but trying to continue [529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529154.953917] Do you have a strange power saving mode enabled? [529154.953918] Dazed and confused, but trying to continue
[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0
We run a DL360pg8 Same problem at reboot with kernel 4.15.0-30 (hang) We can confirm, that the fix in 4.15.0-23 fixes the bug Thanks! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1771467 Title: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0 Status in Linux: Fix Released Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Bug description: Verified on multiple DL360 Gen9 servers with up to date firmware. Just before reboot or shutdown, there is the following panic: [ 289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [ 289.093085] {1}[Hardware Error]: event severity: fatal [ 289.093087] {1}[Hardware Error]: Error 0, type: fatal [ 289.093088] {1}[Hardware Error]: section_type: PCIe error [ 289.093090] {1}[Hardware Error]: port_type: 4, root port [ 289.093091] {1}[Hardware Error]: version: 1.16 [ 289.093093] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093094] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093095] {1}[Hardware Error]: slot: 0 [ 289.093096] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093097] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093098] {1}[Hardware Error]: class_code: 040600 [ 289.093378] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093380] {1}[Hardware Error]: Error 1, type: fatal [ 289.093381] {1}[Hardware Error]: section_type: PCIe error [ 289.093382] {1}[Hardware Error]: port_type: 4, root port [ 289.093383] {1}[Hardware Error]: version: 1.16 [ 289.093384] {1}[Hardware Error]: command: 0x6010, status: 0x0143 [ 289.093386] {1}[Hardware Error]: device_id: :00:01.0 [ 289.093386] {1}[Hardware Error]: slot: 0 [ 289.093387] {1}[Hardware Error]: secondary_bus: 0x03 [ 289.093388] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02 [ 289.093674] {1}[Hardware Error]: class_code: 040600 [ 289.093676] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 [ 289.093678] Kernel panic - not syncing: Fatal hardware error! [ 289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation range: 0x8000-0xbfff) [ 289.105835] ERST: [Firmware Warn]: Firmware does not respond in time. It does eventually restart after this. Then during the subsequent POST, the following warning appears: Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical Drive(s) - Operation Failed - 1719-Slot 0 Drive Array - A controller failure event occurred prior to this power-up. (Previous lock up code = 0x13) Action: Install the latest controller firmware. If the problem persists, replace the controller. The latter's symptoms are described in https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565 but the running storage controller firmware is much newer than the doc's resolution. Neither of these problems occur during shutdown/reboot on the xenial kernel. FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56 (01/22/2018)), the shutdown failure mode was a loop like so: [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0. [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.222884] Do you have a strange power saving mode enabled? [529153.222884] Dazed and confused, but trying to continue [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554448] Do you have a strange power saving mode enabled? [529153.554449] Dazed and confused, but trying to continue [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554451] Do you have a strange power saving mode enabled? [529153.554452] Dazed and confused, but trying to continue [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554453] Do you have a strange power saving mode enabled? [529153.554454] Dazed and confused, but trying to continue [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0. [529153.554455] Do you have a strange power saving mode enabled? [529153.554456] Dazed and confused, but trying to continue [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554458] Do you have a strange power saving mode enabled? [529153.554458] Dazed and confused, but trying to continue [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529153.554460] Do you have a strange power saving mode enabled? [529153.554460] Dazed and confused, but trying to continue [529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0. [529154.953917] Do you have a strange power saving mode enabled? [529154.953