Re: MCE: Does this look possibly like a slot issue?

2022-06-21 Thread Chris
On 2022-06-21 12:23, Larry Rosenman wrote: On 06/21/2022 1:23 pm, Chris wrote: On 2022-06-20 17:23, Larry Rosenman wrote: I'm seeing them constantly: FWIW it looks like a sync(ing) problem between your RAM && CPU cache. Are are your clocks set correctly for your CPU && RAM? Is your CPU too hot

Re: MCE: Does this look possibly like a slot issue?

2022-06-21 Thread Larry Rosenman
On 06/21/2022 1:23 pm, Chris wrote: On 2022-06-20 17:23, Larry Rosenman wrote: I'm seeing them constantly: FWIW it looks like a sync(ing) problem between your RAM && CPU cache. Are are your clocks set correctly for your CPU && RAM? Is your CPU too hot? Is the CPU cache ECC? root@freenas[~]# m

Re: MCE: Does this look possibly like a slot issue?

2022-06-21 Thread Chris
On 2022-06-20 17:23, Larry Rosenman wrote: I'm seeing them constantly: FWIW it looks like a sync(ing) problem between your RAM && CPU cache. Are are your clocks set correctly for your CPU && RAM? Is your CPU too hot? Is the CPU cache ECC? root@freenas[~]# mcelog --dmi Hardware event. This is n

Re: MCE: Does this look possibly like a slot issue?

2022-06-21 Thread Ultima
Completely agree with you, Rodney. The LGA on the motherboard can be bent very easy when moving so I wanted to recommend this last. Larry, as Rodney mentioned, it's more or less your last option. This is likely the CPU and not the module itself. There is still a small chance that is motherboard/sl

Re: MCE: Does this look possibly like a slot issue?

2022-06-21 Thread Larry Rosenman
Looks like it might be just that, Rodney: root@freenas[~]# mcelog Hardware event. This is not a software error. MCE 0 CPU 14 BANK 8 TSC 525efc019bb6 MISC ac29890200040083 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ER

Re: MCE: Does this look possibly like a slot issue?

2022-06-21 Thread Rodney W. Grimes
> > > Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using all > the memory. Depending on the results of that one thing that is often overlooked when trying to trouble shoot memory systems in modern Intel systems is the fact that the DIMM now talks directly to the CPU chip that

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman
Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using all the memory. On 06/20/2022 7:59 pm, Larry Rosenman wrote: SuperMicro X8DTN+ 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (2400.20-MHz K8-class CPU) I'll bring it down and swap

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman
SuperMicro X8DTN+ 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (2400.20-MHz K8-class CPU) I'll bring it down and swap DIMMS around On 06/20/2022 7:57 pm, Ultima wrote: Hey Larry, One red flag I am seeing is that the error is being produced on the s

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Ultima
Hey Larry, One red flag I am seeing is that the error is being produced on the same CPU/bank with each error you have provided so far. Can you try and follow my original recommendation and swap currently installed DIMM with the problem DIMM slot and see if anything changes? Can you also provide t

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman
Yes and Yes. On 06/20/2022 7:37 pm, Ultima wrote: Are you sure that the module you replaced it with was good? Are you sure you replaced the correct module? Best regards, Richard Gallamore On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman wrote: I'm seeing them constantly: root@freenas[~]# m

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Ultima
Are you sure that the module you replaced it with was good? Are you sure you replaced the correct module? Best regards, Richard Gallamore On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman wrote: > I'm seeing them constantly: > > root@freenas[~]# mcelog --dmi > Hardware event. This is not a softwar

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman
I'm seeing them constantly: root@freenas[~]# mcelog --dmi Hardware event. This is not a software error. MCE 0 CPU 22 BANK 8 TSC 20aab486464a MISC ac29890200046444 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Ultima
Hey Larry, It is possible it's the motherboard itself, but it's rare. The way I would determine this is to swap the DIMM module with another populated slot on the motherboard and see if the error migrated to the new slot or not. Also, this error doesn't necessarily mean there is a problem that ne

MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman
I've gotten a BUNCH of these on my TrueNAS server. I've replaced this DIMM a couple of times, and still the MCE's continue. Is it possible it's Motherboard slot issue? Hardware event. This is not a software error. MCE 8 CPU 22 BANK 8 TSC 5aa4ecdd795a MISC ac29890200046646 ADDR ee2f6e800 TIME 16