Hi, > On 1 Feb 2026, at 16:35, G. Paul Ziemba <[email protected]> wrote: > > OS: 14.2-STABLE as of 250403 > > I seem to have at least one bad ECC DIMM
Check the power supply voltages are within tolerance if you haven’t already. > and was expecting to see MCA > messages in /var/log/messages or to the console (which I have recently > redirected to /var/log/console.log via syslog.conf: > > console.info /var/log/console.log > > but I can't find anything in any of my logs. Why am I not seeing them? If you have the -F variant of the board that supports IPMI, it may be that the BMC is capturing the errors so check the BMC event log. Possibly there is a setting on the BMC to control what gets passed to MCA. Also check the BIOS event logging; I don’t see settings in the BIOS to control MCA events. And check the BIOS version is up to date. > Background: > > Motherboard: Supermicro X11SCA > CPU: Xeon E-2176G > Chipset: C246 > Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC) > > Bios reports ECC on its startup screen and dmidecode reports > > Total Width: 72 bits > Data Width: 64 bits > > for each of the dimms. > > Amanda started reporting checksum errors on large backup files in its > holding disk. I discovered that a large file (200GB) on any of three > disks on this system yields different sha512sum values every time I > run it on the same file. SMART data looks OK on all disks. > > memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have > 4x16GB dimms installed, so I think that corresponds to two bad dimms. > > % sysctl hw.mca > hw.mca.cmc_throttle: 60 > hw.mca.force_scan: 0 > hw.mca.interval: 300 > hw.mca.maxcount: -1 > hw.mca.count: 0 > hw.mca.erratum383: 0 > hw.mca.intel6h_HSD131: 0 > hw.mca.amd10h_L1TP: 1 > hw.mca.log_corrected: 1 > hw.mca.enabled: 1 > > Thanks for any insights. > -- > G. Paul Ziemba > FreeBSD unix: > 8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39 > -- Bob Bishop t: +44 (0)118 940 1243 [email protected] m: +44 (0)783 626 4518
