** Description changed: [Impact] The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user. [Test Case] $ sudo apt install rasdaemon # On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error. $ sudo ras-mc-ctl --errors # There should be an entry for the injected error, as shown below: No Memory errors. PCIe AER events: 1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error No Extlog errors. No MCE errors. [Regression Risk] + Above test was ran on x86 & ARM platforms to mitigate regression risk.
** Attachment added: "einj-aer.sh" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769730/+attachment/5135673/+files/einj-aer.sh -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1769730 Title: Some PCIe errors not surfaced through rasdaemon To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769730/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
