** Description changed:

  SRU Justification:
  
  [ Impact ]
  
  ipmitool sel does not correctly display the sensor's name if its owner is set 
to lun1.
  Upstream bug: https://codeberg.org/IPMITool/ipmitool/issues/8
  
  We were asked to track this in order to enable new hardware from a partner.
  It was reported in v1.8.19 (used in Noble), but likely affects previous 
versions as well.
  
  Steps to reproduce, copied verbatim from the upstream report:
  
  1. Using AMI/BMC to generate a sensor error event. The sensor belong to LUN1
  GPU1_MEM | 10h | ok | 11.1 | Uncorrectable ECC
  GPU2_MEM | 11h | ok | 11.2 | Uncorrectable ECC
  GPU3_MEM | 12h | ok | 11.3 | Uncorrectable ECC
  GPU4_MEM | 13h | ok | 11.4 | Uncorrectable ECC
  GPU5_MEM | 14h | ok | 11.5 | Uncorrectable ECC
  GPU6_MEM | 15h | ok | 11.6 | Uncorrectable ECC
  GPU7_MEM | 16h | ok | 11.7 | Uncorrectable ECC
  GPU8_MEM | 17h | ok | 11.8 | Uncorrectable ECC
  
  2. Run `ipmitool sel elist`
  
  3. Observe the abnormal reply:
  c5 | 2023/08/02 | 17时17分24秒 CST | Memory | Uncorrectable ECC | Asserted
  c6 | 2023/08/02 | 17时18分29秒 CST | Memory | Uncorrectable ECC | Asserted
  c7 | 2023/08/02 | 17时18分29秒 CST | Memory | Uncorrectable ECC | Asserted
  c8 | 2023/08/02 | 17时18分30秒 CST | Memory | Uncorrectable ECC | Asserted
  c9 | 2023/08/02 | 17时18分30秒 CST | Memory | Uncorrectable ECC | Asserted
  ca | 2023/08/02 | 17时19分34秒 CST | Memory | Uncorrectable ECC | Asserted
  cb | 2023/08/02 | 17时19分34秒 CST | Memory INTEGRAL_DIMM | Uncorrectable ECC | 
Asserted
  cc | 2023/08/02 | 17时19分34秒 CST | Memory | Uncorrectable ECC | Asserted
  SensorName is empty or wrong.(Expexct GPU1_MEM)
  
  A fix was proposed upstream but is yet to be merged:
  https://codeberg.org/IPMITool/ipmitool/pulls/39
  
  [ Test Plan ]
  
  I have confirmed that this cleanly applies to the latest Plucky ipmitool
  and prepared a test PPA:
  https://launchpad.net/~mitchellaugustin/+archive/ubuntu/ipmitool-lun-
  sauce/
  
+ I tested for regressions when running `ipmitool sel elist` on our DGX
+ A100, and did not observe any. (results were the same as with current
+ plucky ipmitool)
+ 
  I am asking Nvidia to confirm that this works for them, since we do not
  currently have the hardware to test the new functionality internally.
  (However, they have already confirmed that this patch, when applied to
  jammy ipmitool, works as expected.)
  
  [ Fix ]
  
  The change will add checking of SEL Generator ID byte 2 LUN bits [1:0]
  in the compare with the SDR LUN field to display the correct SDR string
  in the SEL event
  
  [ Where problems could occur ]
  
  If upstream ever does accept a different version of this patch which
  conflicts with our sauce, we may need to revert and apply their version.
  However, they have not responded to the upstream merge request in over 7
  months [0][1], so since this functionality is still required by our
  users, and since it only adjusts a part of ipmitool to match the ipmi
  specification, it seems appropriate for a sauce patch.
  
  The regression risk should be low since this just adds a check for a
  field that should already be present in any hardware-generated event
  records according to the IPMI spec[2] (section 32.1 SEL Event Records),
  and this check is only done/used in a function that is specifically for
  printing sensor-generated event records.
  
  [0]:
  Patch: 
https://codeberg.org/IPMITool/ipmitool/commit/719105ebb26b74f402437844ac9ea7707ff0ffb0
  PR: https://codeberg.org/IPMITool/ipmitool/pulls/39
  
  [1]: https://codeberg.org/IPMITool/ipmitool/issues/8
  
  [2]:
  https://www.intel.la/content/dam/www/public/us/en/documents/specification-
  updates/ipmi-intelligent-platform-mgt-interface-spec-2nd-gen-v2-0-spec-
  update.pdf

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076173

Title:
  cannot display sensor name when its owner is lun1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ipmitool/+bug/2076173/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to