On 2/1/2026 15:30, Bob Bishop wrote:
Hi,On 1 Feb 2026, at 16:35, G. Paul Ziemba<[email protected]> wrote: OS: 14.2-STABLE as of 250403 I seem to have at least one bad ECC DIMMCheck the power supply voltages are within tolerance if you haven’t already.and was expecting to see MCA messages in /var/log/messages or to the console (which I have recently redirected to /var/log/console.log via syslog.conf: console.info /var/log/console.log but I can't find anything in any of my logs. Why am I not seeing them?If you have the -F variant of the board that supports IPMI, it may be that the BMC is capturing the errors so check the BMC event log. Possibly there is a setting on the BMC to control what gets passed to MCA. Also check the BIOS event logging; I don’t see settings in the BIOS to control MCA events. And check the BIOS version is up to date.Background: Motherboard: Supermicro X11SCA CPU: Xeon E-2176G Chipset: C246 Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC) Bios reports ECC on its startup screen and dmidecode reports Total Width: 72 bits Data Width: 64 bits for each of the dimms. Amanda started reporting checksum errors on large backup files in its holding disk. I discovered that a large file (200GB) on any of three disks on this system yields different sha512sum values every time I run it on the same file. SMART data looks OK on all disks. memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have 4x16GB dimms installed, so I think that corresponds to two bad dimms. % sysctl hw.mca hw.mca.cmc_throttle: 60 hw.mca.force_scan: 0 hw.mca.interval: 300 hw.mca.maxcount: -1 hw.mca.count: 0 hw.mca.erratum383: 0 hw.mca.intel6h_HSD131: 0 hw.mca.amd10h_L1TP: 1 hw.mca.log_corrected: 1 hw.mca.enabled: 1 Thanks for any insights. -- G. Paul Ziemba FreeBSD unix: 8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39
I have one of these boards in a server here: Platform Firmware Information Vendor: American Megatrends Inc. Version: 2.5 Release Date: 06/14/2024 Address: 0xF0000 Runtime Size: 64 KiB ROM Size: 32 MiB .... Base Board Information Manufacturer: Supermicro Product Name: X11SCA-F Version: 1.01A .... Handle 0x002F, DMI type 17, 84 bytes Memory Device Array Handle: 0x001F Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 16 GiB Form Factor: DIMM Set: None Locator: DIMMA1 Bank Locator: P0_Node0_Channel0_Dimm0 Type: DDR4 Type Detail: Synchronous Speed: 2667 MT/s Manufacturer: SK Hynix Serial Number: 7474963A Asset Tag: 9876543210 Part Number: HMA82GU7CJR8N-VK Rank: 2 Configured Memory Speed: 2667 MT/s Minimum Voltage: 1.2 V Maximum Voltage: 1.2 V Configured Voltage: 1.2 V Memory Technology: DRAM Memory Operating Mode Capability: Volatile memory Firmware Version: Not Specified Module Manufacturer ID: Bank 1, Hex 0xAD Module Product ID: Unknown Memory Subsystem Controller Manufacturer ID: Unknown Memory Subsystem Controller Product ID: Unknown Non-Volatile Size: None Volatile Size: 16 GiB Cache Size: None Logical Size: None .... (and for the other 3, 64Gb total)I have a Mellanox card in this one for 10g/40g networking along with a bunch of SAS/SATA expansion as well; it gets fairly heavy use as a "hot standby" Postgres database box, general file service, video server and as a build system as well for various distributions I use in other contexts.
Check the RAM part numbers against what Supermicro specifies as "approved"; these boards are extremely picky in that regard but typically if you have the wrong part numbers compared with what they want they will refuse to POST straight up rather than do hinky stuff.
This board has been in service here for quite a long time (many years); the most-recent BIOS is what I am running I believe; I've not had any trouble with ECC errors being logged nor any sort of data corruption, crashes or other misbehavior that could be attributed to that sort of issue.
I'm extremely allergic to RAM issues due to the machine supporting a quite-large amount of storage on ZFS and the need for it to be rock-solid reliable. It is.
The CPU I have in mine is a E-2146G.I'm on 14.3-STABLE (compiled from source fairly recently as I keep up with security and potential driver issues that might impact me.)
-- Karl Denninger [email protected] /The Market Ticker/ /[S/MIME encrypted email preferred]/
smime.p7s
Description: S/MIME Cryptographic Signature
