Bob,

thanks for your suggestions.

The motherboard is a plain X11SCA (no -F ipmi)

I don't know of a way to read the power supply voltages in software
while FreeBSD is running, but I did reboot into the BIOS setup and
read voltages there, and they look normal to me:

    VCPU:        1.136
    VDIMM:       1.224
    12V:        12.233
    5VCC:        5.184
    3.3V_DL:     3.327
    3.3VCC:      3.424
    VSB:         3.328
    VBAT:        3.104
    VCC1_8_DL_PCM:       1.816

The BIOS versions are given as:

    "ver 1.2 Build Date 12/5/19" near the top of the screen; and
    "version 2.19.0045 (c) [AMI]" at the bottom of the screen

I didn't see a setting that (apparently to me) might control how
events might be filtered, but there WAS an event log that had
completely filled up with messages of the form:

    <datetime> smbios 0x02 DIMMB1

with many for DIMMB1 and DIMMB2. I haven't found any documentation yet
of "0x02" other than a few online posts calling it either a single-bit
or a multi-bit ECC memory error.

I'm still favoring a diagnosis of two bad DIMMs; I just wish there were
a way to cause these errors to show up in FreeBSD somewhere so I could
detect them on a running system.


On Sun, Feb 01, 2026 at 08:30:56PM +0000, Bob Bishop wrote:
> Hi,
> 
> > On 1 Feb 2026, at 16:35, G. Paul Ziemba <[email protected]> wrote:
> > 
> > OS: 14.2-STABLE as of 250403
> > 
> > I seem to have at least one bad ECC DIMM
> 
> Check the power supply voltages are within tolerance if you haven???t already.
> 
> > and  was expecting to see MCA
> > messages in /var/log/messages or to the console (which I have recently
> > redirected to /var/log/console.log via syslog.conf:
> > 
> >    console.info /var/log/console.log
> > 
> > but I can't find anything in any of my logs. Why am I not seeing them?
> 
> If you have the -F variant of the board that supports IPMI, it may be that 
> the BMC is capturing the errors so check the BMC event log. Possibly there is 
> a setting on the BMC to control what gets passed to MCA.
> 
> Also check the BIOS event logging; I don???t see settings in the BIOS to 
> control MCA events.
> 
> And check the BIOS version is up to date.
> 
> > Background:
> > 
> > Motherboard: Supermicro X11SCA
> > CPU: Xeon E-2176G
> > Chipset: C246
> > Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)
> > 
> > Bios reports ECC on its startup screen and dmidecode reports
> > 
> >    Total Width: 72 bits
> >    Data Width: 64 bits
> > 
> > for each of the dimms.
> > 
> > Amanda started reporting checksum errors on large backup files in its
> > holding disk. I discovered that a large file (200GB) on any of three
> > disks on this system yields different sha512sum values every time I
> > run it on the same file. SMART data looks OK on all disks.
> > 
> > memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have
> > 4x16GB dimms installed, so I think that corresponds to two bad dimms.
> > 
> >    % sysctl hw.mca
> >    hw.mca.cmc_throttle: 60
> >    hw.mca.force_scan: 0
> >    hw.mca.interval: 300
> >    hw.mca.maxcount: -1
> >    hw.mca.count: 0
> >    hw.mca.erratum383: 0
> >    hw.mca.intel6h_HSD131: 0
> >    hw.mca.amd10h_L1TP: 1
> >    hw.mca.log_corrected: 1
> >    hw.mca.enabled: 1
> > 
> > Thanks for any insights.
> > -- 
> > G. Paul Ziemba
> > FreeBSD unix:
> > 8:31AM  up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39
> > 
> 
> 
> --
> Bob Bishop       t: +44 (0)118 940 1243
> [email protected]     m: +44 (0)783 626 4518
> 
> 
> 
> 
> 

-- 
G. Paul Ziemba
FreeBSD unix:
 7:51AM  up 35 mins, 2 users, load averages: 0.32, 0.56, 0.47

Reply via email to