Bob,
thanks for your suggestions.
The motherboard is a plain X11SCA (no -F ipmi)
I don't know of a way to read the power supply voltages in software
while FreeBSD is running, but I did reboot into the BIOS setup and
read voltages there, and they look normal to me:
VCPU: 1.136
VDIMM: 1.224
12V: 12.233
5VCC: 5.184
3.3V_DL: 3.327
3.3VCC: 3.424
VSB: 3.328
VBAT: 3.104
VCC1_8_DL_PCM: 1.816
The BIOS versions are given as:
"ver 1.2 Build Date 12/5/19" near the top of the screen; and
"version 2.19.0045 (c) [AMI]" at the bottom of the screen
I didn't see a setting that (apparently to me) might control how
events might be filtered, but there WAS an event log that had
completely filled up with messages of the form:
<datetime> smbios 0x02 DIMMB1
with many for DIMMB1 and DIMMB2. I haven't found any documentation yet
of "0x02" other than a few online posts calling it either a single-bit
or a multi-bit ECC memory error.
I'm still favoring a diagnosis of two bad DIMMs; I just wish there were
a way to cause these errors to show up in FreeBSD somewhere so I could
detect them on a running system.
On Sun, Feb 01, 2026 at 08:30:56PM +0000, Bob Bishop wrote:
> Hi,
>
> > On 1 Feb 2026, at 16:35, G. Paul Ziemba <[email protected]> wrote:
> >
> > OS: 14.2-STABLE as of 250403
> >
> > I seem to have at least one bad ECC DIMM
>
> Check the power supply voltages are within tolerance if you haven???t already.
>
> > and was expecting to see MCA
> > messages in /var/log/messages or to the console (which I have recently
> > redirected to /var/log/console.log via syslog.conf:
> >
> > console.info /var/log/console.log
> >
> > but I can't find anything in any of my logs. Why am I not seeing them?
>
> If you have the -F variant of the board that supports IPMI, it may be that
> the BMC is capturing the errors so check the BMC event log. Possibly there is
> a setting on the BMC to control what gets passed to MCA.
>
> Also check the BIOS event logging; I don???t see settings in the BIOS to
> control MCA events.
>
> And check the BIOS version is up to date.
>
> > Background:
> >
> > Motherboard: Supermicro X11SCA
> > CPU: Xeon E-2176G
> > Chipset: C246
> > Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)
> >
> > Bios reports ECC on its startup screen and dmidecode reports
> >
> > Total Width: 72 bits
> > Data Width: 64 bits
> >
> > for each of the dimms.
> >
> > Amanda started reporting checksum errors on large backup files in its
> > holding disk. I discovered that a large file (200GB) on any of three
> > disks on this system yields different sha512sum values every time I
> > run it on the same file. SMART data looks OK on all disks.
> >
> > memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have
> > 4x16GB dimms installed, so I think that corresponds to two bad dimms.
> >
> > % sysctl hw.mca
> > hw.mca.cmc_throttle: 60
> > hw.mca.force_scan: 0
> > hw.mca.interval: 300
> > hw.mca.maxcount: -1
> > hw.mca.count: 0
> > hw.mca.erratum383: 0
> > hw.mca.intel6h_HSD131: 0
> > hw.mca.amd10h_L1TP: 1
> > hw.mca.log_corrected: 1
> > hw.mca.enabled: 1
> >
> > Thanks for any insights.
> > --
> > G. Paul Ziemba
> > FreeBSD unix:
> > 8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39
> >
>
>
> --
> Bob Bishop t: +44 (0)118 940 1243
> [email protected] m: +44 (0)783 626 4518
>
>
>
>
>
--
G. Paul Ziemba
FreeBSD unix:
7:51AM up 35 mins, 2 users, load averages: 0.32, 0.56, 0.47