[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
** Tags removed: targetmilestone-inin--- ** Tags added: targetmilestone-inin14043 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: Fix Released Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
This bug was fixed in the package linux - 3.19.0-32.37 --- linux (3.19.0-32.37) vivid; urgency=low [ Luis Henriques ] * Release Tracking Bug - LP: #1508381 [ Joseph Salisbury ] * SAUCE: storvsc: use small sg_tablesize on x86 - LP: #1495983 [ Phidias Chiang ] * SAUCE: dma: dw_dmac: Workaround for stop probing on HP X360 laptop v2 - LP: #1501580 [ Tim Gardner ] * [Config] Add MMC modules sufficient for net booting - LP: #1502772 [ Upstream Kernel Changes ] * USB: whiteheat: fix potential null-deref at probe - LP: #1478826 - CVE-2015-5257 * dcache: Handle escaped paths in prepend_path - LP: #1441108 - CVE-2015-2925 * vfs: Test for and handle paths that are unreachable from their mnt_root - LP: #1441108 - CVE-2015-2925 * hv_netvsc: Add support to set MTU reservation from guest side - LP: #1494431 * hv_netvsc: Add close of RNDIS filter into change mtu call - LP: #1494431 * powerpc/eeh: Fix missed PE#0 on P7IOC - LP: #1502982 * powerpc/powernv: display reason for Malfunction Alert HMI. - LP: #1482343 * powerpc/powernv: Pull all HMI events before panic. - LP: #1482343 * powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors. - LP: #1482343 * powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable HMI. - LP: #1482343 * powerpc/eeh: Fix PE#0 check in eeh_add_to_parent_pe() - LP: #1502982 * HID: i2c-hid: The interrupt should be level sensitive v2 - LP: #1501187 * HID: i2c-hid: Add support for ACPI GPIO interrupts v2 - LP: #1501187 -- Luis HenriquesWed, 21 Oct 2015 10:30:13 +0100 ** Changed in: linux (Ubuntu Vivid) Status: Fix Committed => Fix Released ** CVE added: http://www.cve.mitre.org/cgi- bin/cvename.cgi?name=2015-2925 ** CVE added: http://www.cve.mitre.org/cgi- bin/cvename.cgi?name=2015-5257 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: Fix Released Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
** Tags removed: verification-needed-vivid ** Tags added: verification-done-vivid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: Fix Committed Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- vivid' to 'verification-done-vivid'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-vivid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: Fix Committed Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
Applied and in the pipeline for UBUNTU: Ubuntu-3.19.0-32.37 ** Changed in: linux (Ubuntu Vivid) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: Fix Committed Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
Tim, I saw that the patches were acked by "Seth Forshee". Were they commited? Any expected version that they will be released? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: In Progress Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
Submitted for Vivid: https://lists.ubuntu.com/archives/kernel- team/2015-October/063607.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: In Progress Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
Hi Tim, We would like to have this targeted for 14.04 SRu also. Is it possible? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
** Also affects: linux (Ubuntu Vivid) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Vivid) Status: New => In Progress ** Changed in: linux (Ubuntu Vivid) Assignee: (unassigned) => Tim Gardner (timg-tpi) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Vivid: In Progress Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
This bug was fixed in the package linux - 4.2.0-7.7 --- linux (4.2.0-7.7) wily; urgency=low [ Tim Gardner ] * Release Tracking Bug - LP: #1490564 * rebase to v4.2 [ Wen Xiong ] * SAUCE: ipr: Byte swapping for device_id attribute in sysfs - LP: #1453892 [ Upstream Kernel Changes ] * rebase to v4.2 - LP: #1487345 -- Tim GardnerWed, 26 Aug 2015 07:06:10 -0600 ** Changed in: linux (Ubuntu Wily) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Released Status in linux source package in Wily: Fix Released Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H -I lanplus -U -P power off $ ipmitool -H -I lanplus -U -P power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
** Also affects: linux (Ubuntu Wily) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Wily) Status: New = In Progress ** Changed in: linux (Ubuntu Wily) Assignee: (unassigned) = Tim Gardner (timg-tpi) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: In Progress Status in linux source package in Wily: In Progress Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H BMC -I lanplus -U user -P passwd power off $ ipmitool -H BMC -I lanplus -U user -P passwd power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
Patches applied for Wily. Lets wait on an SRU for Vivid/Trusty until they've been merged in 4.3 ** Changed in: linux (Ubuntu Wily) Status: In Progress = Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: Fix Committed Status in linux source package in Wily: Fix Committed Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H BMC -I lanplus -U user -P passwd power off $ ipmitool -H BMC -I lanplus -U user -P passwd power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1482343] Re: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error.
** Package changed: ubuntu = linux (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1482343 Title: Trigger a checkstop on unrecoverable MCE/HMI errors to inform BMC/OCC about the error. Status in linux package in Ubuntu: New Bug description: The current implementation of Machine Check handler and HMI handler in Linux, goes down kernel panic path for unrecoverable errors. On FSP based system FSP also gets notified about these errors which then forwards it to PRD (that runs on FSP) for error analysis and gard record creation. On OpenPower (BMC based system e.g. Habanero from TYAN) where PRD runs in Linux host, it never gets a chance to do error analysis at the time of Linux crash and no gard record is created for such errors. Since the faulty component never gets de-configured, the system is vulnerable to get hit by same HW error again. To fix this issue, a new OPAL call 'opal_cec_reboot2()' has been introduced to trigger a checkstop on BMC based system to inform BMC/OCC about this error, so that BMC can collect relevant data for error analysis and decide what component to de-configure before rebooting. Linux kernel should invoke this opal call for unrecoverable MCE and HMI instead before calling kernel panic so that OCC is informed about the error. The kernel changes has already been posted to upstream and are listed below: https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128341.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-May/128342.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132045.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132114.html Above patches needs to be included in ubuntu 14.04.3+ We will update this bug with commit ids, once the above patches are accepted upstream. Contact Information = mahesh.salgaon...@in.ibm.com ---uname output--- Linux rcx2d403 3.19.0-26-generic #27 SMP Tue Aug 4 01:38:15 CDT 2015 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Habanero pass2 system Machine Type = OpenPower, Habanero ---System Hang--- If system is hung, it can be recovered by sending ipmi power off/on command. $ ipmitool -H BMC -I lanplus -U user -P passwd power off $ ipmitool -H BMC -I lanplus -U user -P passwd power on To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1482343/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp