** Description changed: SRU Justification: [ Impact ] ipmitool sel does not correctly display the sensor's name if its owner is set to lun1. Upstream bug: https://codeberg.org/IPMITool/ipmitool/issues/8 We were asked to track this in order to enable new hardware from a partner. It was reported in v1.8.19 (used in Noble), but likely affects previous versions as well. Steps to reproduce, copied verbatim from the upstream report: 1. Using AMI/BMC to generate a sensor error event. The sensor belong to LUN1 GPU1_MEM | 10h | ok | 11.1 | Uncorrectable ECC GPU2_MEM | 11h | ok | 11.2 | Uncorrectable ECC GPU3_MEM | 12h | ok | 11.3 | Uncorrectable ECC GPU4_MEM | 13h | ok | 11.4 | Uncorrectable ECC GPU5_MEM | 14h | ok | 11.5 | Uncorrectable ECC GPU6_MEM | 15h | ok | 11.6 | Uncorrectable ECC GPU7_MEM | 16h | ok | 11.7 | Uncorrectable ECC GPU8_MEM | 17h | ok | 11.8 | Uncorrectable ECC 2. Run `ipmitool sel elist` 3. Observe the abnormal reply: c5 | 2023/08/02 | 17时17分24秒 CST | Memory | Uncorrectable ECC | Asserted c6 | 2023/08/02 | 17时18分29秒 CST | Memory | Uncorrectable ECC | Asserted c7 | 2023/08/02 | 17时18分29秒 CST | Memory | Uncorrectable ECC | Asserted c8 | 2023/08/02 | 17时18分30秒 CST | Memory | Uncorrectable ECC | Asserted c9 | 2023/08/02 | 17时18分30秒 CST | Memory | Uncorrectable ECC | Asserted ca | 2023/08/02 | 17时19分34秒 CST | Memory | Uncorrectable ECC | Asserted cb | 2023/08/02 | 17时19分34秒 CST | Memory INTEGRAL_DIMM | Uncorrectable ECC | Asserted cc | 2023/08/02 | 17时19分34秒 CST | Memory | Uncorrectable ECC | Asserted SensorName is empty or wrong.(Expexct GPU1_MEM) A fix was proposed upstream but is yet to be merged: https://codeberg.org/IPMITool/ipmitool/pulls/39 [ Test Plan ] I have confirmed that this cleanly applies to the latest Plucky ipmitool and prepared a test PPA: https://launchpad.net/~mitchellaugustin/+archive/ubuntu/ipmitool-lun- sauce/ + I tested for regressions when running `ipmitool sel elist` on our DGX + A100, and did not observe any. (results were the same as with current + plucky ipmitool) + I am asking Nvidia to confirm that this works for them, since we do not currently have the hardware to test the new functionality internally. (However, they have already confirmed that this patch, when applied to jammy ipmitool, works as expected.) [ Fix ] The change will add checking of SEL Generator ID byte 2 LUN bits [1:0] in the compare with the SDR LUN field to display the correct SDR string in the SEL event [ Where problems could occur ] If upstream ever does accept a different version of this patch which conflicts with our sauce, we may need to revert and apply their version. However, they have not responded to the upstream merge request in over 7 months [0][1], so since this functionality is still required by our users, and since it only adjusts a part of ipmitool to match the ipmi specification, it seems appropriate for a sauce patch. The regression risk should be low since this just adds a check for a field that should already be present in any hardware-generated event records according to the IPMI spec[2] (section 32.1 SEL Event Records), and this check is only done/used in a function that is specifically for printing sensor-generated event records. [0]: Patch: https://codeberg.org/IPMITool/ipmitool/commit/719105ebb26b74f402437844ac9ea7707ff0ffb0 PR: https://codeberg.org/IPMITool/ipmitool/pulls/39 [1]: https://codeberg.org/IPMITool/ipmitool/issues/8 [2]: https://www.intel.la/content/dam/www/public/us/en/documents/specification- updates/ipmi-intelligent-platform-mgt-interface-spec-2nd-gen-v2-0-spec- update.pdf
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2076173 Title: cannot display sensor name when its owner is lun1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ipmitool/+bug/2076173/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
