On 2026-02-02 08:16, G. Paul Ziemba wrote:
Bob,thanks for your suggestions. The motherboard is a plain X11SCA (no -F ipmi) I don't know of a way to read the power supply voltages in software while FreeBSD is running, but I did reboot into the BIOS setup and read voltages there, and they look normal to me: VCPU: 1.136 VDIMM: 1.224 12V: 12.233 5VCC: 5.184 3.3V_DL: 3.327 3.3VCC: 3.424 VSB: 3.328 VBAT: 3.104 VCC1_8_DL_PCM: 1.816
I'd just like tg mention here, that while the voltages may read within an expected range. It will not inform you of AC bleed. IOW failing diodes will leak AC. Which will result in (eventual) component failure. I've tossed many a PSU for just this reason. If you happen to have a spare around. It'd make it pretty to test this. --Chris
The BIOS versions are given as: "ver 1.2 Build Date 12/5/19" near the top of the screen; and "version 2.19.0045 (c) [AMI]" at the bottom of the screen I didn't see a setting that (apparently to me) might control how events might be filtered, but there WAS an event log that had completely filled up with messages of the form: <datetime> smbios 0x02 DIMMB1 with many for DIMMB1 and DIMMB2. I haven't found any documentation yet of "0x02" other than a few online posts calling it either a single-bit or a multi-bit ECC memory error. I'm still favoring a diagnosis of two bad DIMMs; I just wish there were a way to cause these errors to show up in FreeBSD somewhere so I could detect them on a running system. On Sun, Feb 01, 2026 at 08:30:56PM +0000, Bob Bishop wrote:Hi, > On 1 Feb 2026, at 16:35, G. Paul Ziemba <[email protected]> wrote: > > OS: 14.2-STABLE as of 250403 > > I seem to have at least one bad ECC DIMMCheck the power supply voltages are within tolerance if you haven???t already.> and was expecting to see MCA > messages in /var/log/messages or to the console (which I have recently > redirected to /var/log/console.log via syslog.conf: > > console.info /var/log/console.log > > but I can't find anything in any of my logs. Why am I not seeing them?If you have the -F variant of the board that supports IPMI, it may be that the BMC is capturing the errors so check the BMC event log. Possibly there is a setting on the BMC to control what gets passed to MCA.Also check the BIOS event logging; I don???t see settings in the BIOS to control MCA events.And check the BIOS version is up to date. > Background: > > Motherboard: Supermicro X11SCA > CPU: Xeon E-2176G > Chipset: C246 > Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC) > > Bios reports ECC on its startup screen and dmidecode reports > > Total Width: 72 bits > Data Width: 64 bits > > for each of the dimms. > > Amanda started reporting checksum errors on large backup files in its > holding disk. I discovered that a large file (200GB) on any of three > disks on this system yields different sha512sum values every time I > run it on the same file. SMART data looks OK on all disks. > > memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have > 4x16GB dimms installed, so I think that corresponds to two bad dimms. > > % sysctl hw.mca > hw.mca.cmc_throttle: 60 > hw.mca.force_scan: 0 > hw.mca.interval: 300 > hw.mca.maxcount: -1 > hw.mca.count: 0 > hw.mca.erratum383: 0 > hw.mca.intel6h_HSD131: 0 > hw.mca.amd10h_L1TP: 1 > hw.mca.log_corrected: 1 > hw.mca.enabled: 1 > > Thanks for any insights. > -- > G. Paul Ziemba > FreeBSD unix: > 8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39 > -- Bob Bishop t: +44 (0)118 940 1243 [email protected] m: +44 (0)783 626 4518
--Chris
0xE512722F.asc
Description: application/pgp-keys
