[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2019-07-24 Thread Brad Figg
** Tags added: cscc

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  
  == SRU Justification ==
  Mainline commit introduced a regression in v4.15-rc1.  The regression
  causes a kernel panic during system shutdown.  This commit fixes
  that regression.  This commit was also cc'd to upstream stable, but it
  has not landed in Bionic as of yet.

  == Fix ==
  0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")

  == Regression Potential ==
  Low.  This patch fixes a current regression.  It has been cc'd to
  upstream stable, so it has had additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
     to this power-up.  (Previous lock up code = 0x13) Action: Install the
     latest controller firmware. If the problem persists, replace the
     controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2019-03-04 Thread Kai-Heng Feng
It's included since 4.15.0-34.37.

** Changed in: linux (Ubuntu)
   Status: Confirmed => Fix Released

** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  
  == SRU Justification ==
  Mainline commit introduced a regression in v4.15-rc1.  The regression
  causes a kernel panic during system shutdown.  This commit fixes
  that regression.  This commit was also cc'd to upstream stable, but it
  has not landed in Bionic as of yet.

  == Fix ==
  0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")

  == Regression Potential ==
  Low.  This patch fixes a current regression.  It has been cc'd to
  upstream stable, so it has had additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
     to this power-up.  (Previous lock up code = 0x13) Action: Install the
     latest controller firmware. If the problem persists, replace the
     controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2019-02-28 Thread Quiksmage
Was this one forgotten, haha? If I can help in any way, please let me know.
Thanks!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  
  == SRU Justification ==
  Mainline commit introduced a regression in v4.15-rc1.  The regression
  causes a kernel panic during system shutdown.  This commit fixes
  that regression.  This commit was also cc'd to upstream stable, but it
  has not landed in Bionic as of yet.

  == Fix ==
  0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")

  == Regression Potential ==
  Low.  This patch fixes a current regression.  It has been cc'd to
  upstream stable, so it has had additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
     to this power-up.  (Previous lock up code = 0x13) Action: Install the
     latest controller firmware. If the problem persists, replace the
     controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2019-01-15 Thread Joseph Salisbury
** Changed in: linux (Ubuntu)
 Assignee: Joseph Salisbury (jsalisbury) => (unassigned)

** Changed in: linux (Ubuntu)
   Status: In Progress => Confirmed

** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Confirmed

** Changed in: linux (Ubuntu Bionic)
 Assignee: Joseph Salisbury (jsalisbury) => (unassigned)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  
  == SRU Justification ==
  Mainline commit introduced a regression in v4.15-rc1.  The regression
  causes a kernel panic during system shutdown.  This commit fixes
  that regression.  This commit was also cc'd to upstream stable, but it
  has not landed in Bionic as of yet.

  == Fix ==
  0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")

  == Regression Potential ==
  Low.  This patch fixes a current regression.  It has been cc'd to
  upstream stable, so it has had additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
     to this power-up.  (Previous lock up code = 0x13) Action: Install the
     latest controller firmware. If the problem persists, replace the
     controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2019-01-15 Thread Junien Fridrick
What's the status of the SRU for this bug ? Thanks !

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  
  == SRU Justification ==
  Mainline commit introduced a regression in v4.15-rc1.  The regression
  causes a kernel panic during system shutdown.  This commit fixes
  that regression.  This commit was also cc'd to upstream stable, but it
  has not landed in Bionic as of yet.

  == Fix ==
  0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")

  == Regression Potential ==
  Low.  This patch fixes a current regression.  It has been cc'd to
  upstream stable, so it has had additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
     to this power-up.  (Previous lock up code = 0x13) Action: Install the
     latest controller firmware. If the problem persists, replace the
     controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-08-22 Thread Frank Brendel
Happily this fixes NMIs on DL380p Gen8 during reboot too.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  
  == SRU Justification ==
  Mainline commit introduced a regression in v4.15-rc1.  The regression
  causes a kernel panic during system shutdown.  This commit fixes
  that regression.  This commit was also cc'd to upstream stable, but it
  has not landed in Bionic as of yet.

  == Fix ==
  0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")

  == Regression Potential ==
  Low.  This patch fixes a current regression.  It has been cc'd to
  upstream stable, so it has had additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
     to this power-up.  (Previous lock up code = 0x13) Action: Install the
     latest controller firmware. If the problem persists, replace the
     controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-08-15 Thread Steffen Neumann
Hi, we've been hit on ProLiant BL460c Gen9 with H244br
07:00.0 Serial Attached SCSI controller: Hewlett-Packard Company Smart Array 
Gen9 Controllers
and happy to test if a kernel *.deb is available. Yours, Steffen

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  
  == SRU Justification ==
  Mainline commit introduced a regression in v4.15-rc1.  The regression
  causes a kernel panic during system shutdown.  This commit fixes
  that regression.  This commit was also cc'd to upstream stable, but it
  has not landed in Bionic as of yet.

  == Fix ==
  0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")

  == Regression Potential ==
  Low.  This patch fixes a current regression.  It has been cc'd to
  upstream stable, so it has had additon upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
     to this power-up.  (Previous lock up code = 0x13) Action: Install the
     latest controller firmware. If the problem persists, replace the
     controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-08-10 Thread Joseph Salisbury
https://lists.ubuntu.com/archives/kernel-team/2018-August/094654.html

** Description changed:

+ 
+ == SRU Justification ==
+ Mainline commit introduced a regression in v4.15-rc1.  The regression
+ causes a kernel panic during system shutdown.  This commit fixes
+ that regression.  This commit was also cc'd to upstream stable, but it
+ has not landed in Bionic as of yet.
+ 
+ == Fix ==
+ 0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")
+ 
+ == Regression Potential ==
+ Low.  This patch fixes a current regression.  It has been cc'd to
+ upstream stable, so it has had additon upstream review.
+ 
+ == Test Case ==
+ A test kernel was built with this patch and tested by the original bug 
reporter.
+ The bug reporter states the test kernel resolved the bug.
+ 
+ 
  Verified on multiple DL360 Gen9 servers with up to date firmware.  Just
  before reboot or shutdown, there is the following panic:
  
  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.
  
  It does eventually restart after this.  Then during the subsequent POST,
  the following warning appears:
  
  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
-  - 1719-Slot 0 Drive Array - A controller failure event occurred prior
-to this power-up.  (Previous lock up code = 0x13) Action: Install the
-latest controller firmware. If the problem persists, replace the
-controller.
+  - 1719-Slot 0 Drive Array - A controller failure event occurred prior
+    to this power-up.  (Previous lock up code = 0x13) Action: Install the
+    latest controller firmware. If the problem persists, replace the
+    controller.
  
  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the doc's
  resolution.
  
  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.
  
  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56 (01/22/2018)),
  the shutdown failure mode was a loop like so:
  
  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-08-10 Thread Joseph Salisbury
Thanks for testing!  I'll submit an SRU request for that commit.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  [529153.554460] Dazed and confused, but trying to continue
  [529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529154.953917] Do you have a strange power saving mode enabled?
  [529154.953918] Dazed and confused, but trying to continue

  But upgrading to 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-08-10 Thread Andreas Bininda
We download the fix from

http://kernel.ubuntu.com/~jsalisbury/lp1771467

and testet it.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  [529153.554460] Dazed and confused, but trying to continue
  [529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529154.953917] Do you have a strange power saving mode enabled?
  [529154.953918] Dazed and confused, but trying to 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-08-10 Thread Andreas Bininda
We run a DL360pg8
Same problem at reboot with kernel 4.15.0-30 (hang)

We can confirm, that the fix in 4.15.0-23 fixes the bug

Thanks!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  [529153.554460] Dazed and confused, but trying to continue
  [529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529154.953917] Do you have a strange power saving mode enabled?
  

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-06-23 Thread Quiksmage
I wouldn't mind testing, but I'm not sure how :)

I'm on 18.04 LTS
with 4.15.0-23-generic

If you can give me some commands to try (as well as a command to
revert), I have no problem trying.

Thanks!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  [529153.554460] Dazed and confused, but trying to continue
  [529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-06-21 Thread Joseph Salisbury
I built a test kernel with commit 0d98ba8d70b0070ac117452ea0b663e26bbf46bf.  
The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1771467

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the 
linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the 
linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-06-21 Thread Bug Watch Updater
Launchpad has imported 27 comments from the remote bug at
https://bugzilla.kernel.org/show_bug.cgi?id=199779.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.


On 2018-05-21T07:03:07+00:00 ryan wrote:

Created attachment 276079
lspci -vv

On HPe DL360 Gen9 (and possibly other gens and/or products; I haven't
been able to test other HP hardware right now, but I do have several
DL360 Gen9s I've confirmed on), upon shutdown/reboot, it will crash
with:

[  122.447111] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
[  122.447112] {1}[Hardware Error]: event severity: fatal
[  122.447113] {1}[Hardware Error]:  Error 0, type: fatal
[  122.447114] {1}[Hardware Error]:   section_type: PCIe error
[  122.447115] {1}[Hardware Error]:   port_type: 4, root port
[  122.447116] {1}[Hardware Error]:   version: 1.16
[  122.447118] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
[  122.447119] {1}[Hardware Error]:   device_id: :00:01.0
[  122.447119] {1}[Hardware Error]:   slot: 0
[  122.447120] {1}[Hardware Error]:   secondary_bus: 0x03
[  122.447120] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
[  122.447121] {1}[Hardware Error]:   class_code: 040600
[  122.447122] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
[  122.447123] {1}[Hardware Error]:  Error 1, type: fatal
[  122.447123] {1}[Hardware Error]:   section_type: PCIe error
[  122.447124] {1}[Hardware Error]:   port_type: 4, root port
[  122.447125] {1}[Hardware Error]:   version: 1.16
[  122.447125] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
[  122.447126] {1}[Hardware Error]:   device_id: :00:01.0
[  122.447127] {1}[Hardware Error]:   slot: 0
[  122.447127] {1}[Hardware Error]:   secondary_bus: 0x03
[  122.447128] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
[  122.447129] {1}[Hardware Error]:   class_code: 040600
[  122.447130] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
[  122.447131] Kernel panic - not syncing: Fatal hardware error!
[  122.447166] Kernel Offset: 0x1c00 from 0x8100 (relocation 
range: 0x8000-0xbfff)
[  122.459295] ERST: [Firmware Warn]: Firmware does not respond in time.

And after that, upon POST, the storage controller is not happy but does
eventually work:

Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
Drive(s) - Operation Failed
 - 1719-Slot 0 Drive Array - A controller failure event occurred prior
   to this power-up.  (Previous lock up code = 0x13) Action: Install the
   latest controller firmware. If the problem persists, replace the
   controller.

Up to date firmware (P89 01/22/2018, controller 6.30).  Interestingly,
on older (circa 2016 but I don't have an exact version) firmware, this
manifested as a crash loop:

[529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
[529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.222884] Do you have a strange power saving mode enabled?
[529153.222884] Dazed and confused, but trying to continue
[529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554448] Do you have a strange power saving mode enabled?
[529153.554449] Dazed and confused, but trying to continue

I've narrowed it down to https://patchwork.kernel.org/patch/10027157/ as
part of commit 1b6115fbe3b3db746d7baa11399dd617fc75e1c4; removing that
line prevents the panic.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1771467/comments/4


On 2018-05-21T11:11:51+00:00 okaya wrote:

Can you test this patch?

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-
next.git/commit/drivers/pci/hotplug?id=d22b362184553899f7d6b6760899a77d3b2d7c1b

There is a known Intel errata that we missed.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1771467/comments/5


On 2018-05-21T11:24:20+00:00 okaya wrote:

can you also share your dmesg?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1771467/comments/6


On 2018-05-21T19:31:17+00:00 ryan wrote:

Created attachment 276103
4.17.0-rc5-next-20180517 dmesg

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1771467/comments/7


On 2018-05-21T19:32:26+00:00 ryan wrote:

Thanks, but same problem with that patch against 4.15.  Even tried
next-20180517 to be sure, no luck.  dmesg against next-20180517 has been
attached.

Reply at:

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-06-20 Thread Joseph Salisbury
** Changed in: linux (Ubuntu)
   Status: Triaged => In Progress

** Changed in: linux (Ubuntu Bionic)
   Status: Triaged => In Progress

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-06-19 Thread Ryan Finnie
The fix was bikeshedded a tiny bit on LKML, but is now accepted upstream
and AIUI will be in linux-next soon:
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git/commit/?id=0d98ba8d70b0070ac117452ea0b663e26bbf46bf

This change is tested as backwards compatible with Ubuntu 4.15, and
would be appreciated for SRU.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  [529153.554460] Dazed and 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-06-12 Thread Quiksmage
hi, I just updated to 18.04 today and have started to see this message on 
reboot. 
I am also on an HP DL380 Gen9.

It looks like everything has already been found :).

Pardon me for asking, but how long does a fix like this (ballpark
estimate) usually take to get into an OS update?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  [529153.554460] Dazed and confused, but trying to continue
  

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-05-22 Thread Ryan Finnie
A patch has been submitted to linux-pci, and I've confirmed this fix
works: https://lkml.org/lkml/2018/5/22/817

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  [529153.554460] Dazed and confused, but trying to continue
  [529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529154.953917] Do you have a strange power saving mode enabled?
  [529154.953918] Dazed and confused, but trying 

[Kernel-packages] [Bug 1771467] Re: Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

2018-05-22 Thread Joseph Salisbury
** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => Triaged

** Changed in: linux (Ubuntu)
   Status: Confirmed => Triaged

** Also affects: linux via
   https://bugzilla.kernel.org/show_bug.cgi?id=199779
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771467

Title:
  Reboot/shutdown kernel panic on HP DL360/DL380 Gen9 w/ bionic 4.15.0

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  Verified on multiple DL360 Gen9 servers with up to date firmware.
  Just before reboot or shutdown, there is the following panic:

  [  289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 1
  [  289.093085] {1}[Hardware Error]: event severity: fatal
  [  289.093087] {1}[Hardware Error]:  Error 0, type: fatal
  [  289.093088] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093090] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093091] {1}[Hardware Error]:   version: 1.16
  [  289.093093] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093094] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093095] {1}[Hardware Error]:   slot: 0
  [  289.093096] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093097] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093098] {1}[Hardware Error]:   class_code: 040600
  [  289.093378] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093380] {1}[Hardware Error]:  Error 1, type: fatal
  [  289.093381] {1}[Hardware Error]:   section_type: PCIe error
  [  289.093382] {1}[Hardware Error]:   port_type: 4, root port
  [  289.093383] {1}[Hardware Error]:   version: 1.16
  [  289.093384] {1}[Hardware Error]:   command: 0x6010, status: 0x0143
  [  289.093386] {1}[Hardware Error]:   device_id: :00:01.0
  [  289.093386] {1}[Hardware Error]:   slot: 0
  [  289.093387] {1}[Hardware Error]:   secondary_bus: 0x03
  [  289.093388] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f02
  [  289.093674] {1}[Hardware Error]:   class_code: 040600
  [  289.093676] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, 
control: 0x0003
  [  289.093678] Kernel panic - not syncing: Fatal hardware error!
  [  289.093745] Kernel Offset: 0x1cc0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [  289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

  It does eventually restart after this.  Then during the subsequent
  POST, the following warning appears:

  Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
  Drive(s) - Operation Failed
   - 1719-Slot 0 Drive Array - A controller failure event occurred prior
 to this power-up.  (Previous lock up code = 0x13) Action: Install the
 latest controller firmware. If the problem persists, replace the
 controller.

  The latter's symptoms are described in
  https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565
  but the running storage controller firmware is much newer than the
  doc's resolution.

  Neither of these problems occur during shutdown/reboot on the xenial
  kernel.

  FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56
  (01/22/2018)), the shutdown failure mode was a loop like so:

  [529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
  [529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.222884] Do you have a strange power saving mode enabled?
  [529153.222884] Dazed and confused, but trying to continue
  [529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554448] Do you have a strange power saving mode enabled?
  [529153.554449] Dazed and confused, but trying to continue
  [529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554451] Do you have a strange power saving mode enabled?
  [529153.554452] Dazed and confused, but trying to continue
  [529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554453] Do you have a strange power saving mode enabled?
  [529153.554454] Dazed and confused, but trying to continue
  [529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
  [529153.554455] Do you have a strange power saving mode enabled?
  [529153.554456] Dazed and confused, but trying to continue
  [529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554458] Do you have a strange power saving mode enabled?
  [529153.554458] Dazed and confused, but trying to continue
  [529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
  [529153.554460] Do you have a strange power saving mode enabled?
  [529153.554460] Dazed and confused, but trying to continue
  [529154.953916] Uhhuh. NMI