Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman



Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using all 
the memory.


On 06/20/2022 7:59 pm, Larry Rosenman wrote:


SuperMicro X8DTN+

2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU   
E5645  @ 2.40GHz (2400.20-MHz K8-class CPU)


I'll bring it down and swap DIMMS around

On 06/20/2022 7:57 pm, Ultima wrote:

Hey Larry,

One red flag I am seeing is that the error is being produced on
the same CPU/bank with each error you have provided so far.

Can you try and follow my original recommendation and swap
currently installed DIMM with the problem DIMM slot and see
if anything changes?

Can you also provide the motherboard model? Also, do you
have multiple CPUs installed in this system?

Best regards,
Richard Gallamore

On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman  wrote:

Yes and Yes.

On 06/20/2022 7:37 pm, Ultima wrote:

Are you sure that the module you replaced it with was good?
Are you sure you replaced the correct module?

Best regards,
Richard Gallamore

On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman  wrote:

I'm seeing them constantly:

root@freenas[~]# mcelog --dmi
Hardware event. This is not a software error.
MCE 0
CPU 22 BANK 8 TSC 20aab486464a
MISC ac29890200046444 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 44
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 1
CPU 22 BANK 8 TSC 296dfcc82582
MISC ac29890200041381 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 2
CPU 22 BANK 8 TSC 2a5604a6a070
MISC ac29890200044281
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory ECC error occurred during scrub
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 884200cf MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
Hardware event. This is not a software error.
MCE 3
CPU 22 BANK 8 TSC 31e141418eb8
MISC ac29890200046a4a ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 4a
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 4
CPU 22 BANK 8 TSC 3a014afee106
MISC ac29890200046646 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 46
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 5
CPU 22 BANK 8 TSC 41d1dbef1a6a
MISC ac29890200046141 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 41
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Numb

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman



SuperMicro X8DTN+

2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU   
E5645  @ 2.40GHz (2400.20-MHz K8-class CPU)


I'll bring it down and swap DIMMS around

On 06/20/2022 7:57 pm, Ultima wrote:


Hey Larry,

One red flag I am seeing is that the error is being produced on
the same CPU/bank with each error you have provided so far.

Can you try and follow my original recommendation and swap
currently installed DIMM with the problem DIMM slot and see
if anything changes?

Can you also provide the motherboard model? Also, do you
have multiple CPUs installed in this system?

Best regards,
Richard Gallamore

On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman  wrote:

Yes and Yes.

On 06/20/2022 7:37 pm, Ultima wrote:

Are you sure that the module you replaced it with was good?
Are you sure you replaced the correct module?

Best regards,
Richard Gallamore

On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman  wrote:

I'm seeing them constantly:

root@freenas[~]# mcelog --dmi
Hardware event. This is not a software error.
MCE 0
CPU 22 BANK 8 TSC 20aab486464a
MISC ac29890200046444 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 44
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 1
CPU 22 BANK 8 TSC 296dfcc82582
MISC ac29890200041381 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 2
CPU 22 BANK 8 TSC 2a5604a6a070
MISC ac29890200044281
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory ECC error occurred during scrub
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 884200cf MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
Hardware event. This is not a software error.
MCE 3
CPU 22 BANK 8 TSC 31e141418eb8
MISC ac29890200046a4a ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 4a
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 4
CPU 22 BANK 8 TSC 3a014afee106
MISC ac29890200046646 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 46
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 5
CPU 22 BANK 8 TSC 41d1dbef1a6a
MISC ac29890200046141 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 41
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 6
CPU 22 BANK 8 TSC 4a1b1ecef4

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Ultima
Hey Larry,

One red flag I am seeing is that the error is being produced on
the same CPU/bank with each error you have provided so far.
Can you try and follow my original recommendation and swap
currently installed DIMM with the problem DIMM slot and see
if anything changes?

Can you also provide the motherboard model? Also, do you
have multiple CPUs installed in this system?

Best regards,
Richard Gallamore

On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman  wrote:

> Yes and Yes.
>
>
> On 06/20/2022 7:37 pm, Ultima wrote:
>
> Are you sure that the module you replaced it with was good?
> Are you sure you replaced the correct module?
>
> Best regards,
> Richard Gallamore
>
> On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman  wrote:
>
> I'm seeing them constantly:
>
> root@freenas[~]# mcelog --dmi
> Hardware event. This is not a software error.
> MCE 0
> CPU 22 BANK 8 TSC 20aab486464a
> MISC ac29890200046444 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 44
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 1
> CPU 22 BANK 8 TSC 296dfcc82582
> MISC ac29890200041381 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 81
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 2
> CPU 22 BANK 8 TSC 2a5604a6a070
> MISC ac29890200044281
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory ECC error occurred during scrub
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 81
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 884200cf MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> Hardware event. This is not a software error.
> MCE 3
> CPU 22 BANK 8 TSC 31e141418eb8
> MISC ac29890200046a4a ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 4a
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 4
> CPU 22 BANK 8 TSC 3a014afee106
> MISC ac29890200046646 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 46
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 5
> CPU 22 BANK 8 TSC 41d1dbef1a6a
> MISC ac29890200046141 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 41
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. Thi

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman



Yes and Yes.

On 06/20/2022 7:37 pm, Ultima wrote:


Are you sure that the module you replaced it with was good?
Are you sure you replaced the correct module?

Best regards,
Richard Gallamore

On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman  wrote:

I'm seeing them constantly:

root@freenas[~]# mcelog --dmi
Hardware event. This is not a software error.
MCE 0
CPU 22 BANK 8 TSC 20aab486464a
MISC ac29890200046444 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 44
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 1
CPU 22 BANK 8 TSC 296dfcc82582
MISC ac29890200041381 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 2
CPU 22 BANK 8 TSC 2a5604a6a070
MISC ac29890200044281
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory ECC error occurred during scrub
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 884200cf MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
Hardware event. This is not a software error.
MCE 3
CPU 22 BANK 8 TSC 31e141418eb8
MISC ac29890200046a4a ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 4a
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 4
CPU 22 BANK 8 TSC 3a014afee106
MISC ac29890200046646 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 46
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 5
CPU 22 BANK 8 TSC 41d1dbef1a6a
MISC ac29890200046141 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 41
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 6
CPU 22 BANK 8 TSC 4a1b1ecef446
MISC ac29890200046a4a ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 4a
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 7
CPU 22 BANK 8 TSC 527bc27db776
MISC ac

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Ultima
Are you sure that the module you replaced it with was good?
Are you sure you replaced the correct module?

Best regards,
Richard Gallamore

On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman  wrote:

> I'm seeing them constantly:
>
> root@freenas[~]# mcelog --dmi
> Hardware event. This is not a software error.
> MCE 0
> CPU 22 BANK 8 TSC 20aab486464a
> MISC ac29890200046444 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 44
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 1
> CPU 22 BANK 8 TSC 296dfcc82582
> MISC ac29890200041381 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 81
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 2
> CPU 22 BANK 8 TSC 2a5604a6a070
> MISC ac29890200044281
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory ECC error occurred during scrub
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 81
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 884200cf MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> Hardware event. This is not a software error.
> MCE 3
> CPU 22 BANK 8 TSC 31e141418eb8
> MISC ac29890200046a4a ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 4a
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 4
> CPU 22 BANK 8 TSC 3a014afee106
> MISC ac29890200046646 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 46
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 5
> CPU 22 BANK 8 TSC 41d1dbef1a6a
> MISC ac29890200046141 ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 41
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
> Hardware event. This is not a software error.
> MCE 6
> CPU 22 BANK 8 TSC 4a1b1ecef446
> MISC ac29890200046a4a ADDR ee2f6e800
> TIME 1655770989 Mon Jun 20 19:23:09 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 4a
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 S

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman



I'm seeing them constantly:

root@freenas[~]# mcelog --dmi
Hardware event. This is not a software error.
MCE 0
CPU 22 BANK 8 TSC 20aab486464a
MISC ac29890200046444 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 44
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 1
CPU 22 BANK 8 TSC 296dfcc82582
MISC ac29890200041381 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 2
CPU 22 BANK 8 TSC 2a5604a6a070
MISC ac29890200044281
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory ECC error occurred during scrub
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 81
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 884200cf MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
Hardware event. This is not a software error.
MCE 3
CPU 22 BANK 8 TSC 31e141418eb8
MISC ac29890200046a4a ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 4a
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 4
CPU 22 BANK 8 TSC 3a014afee106
MISC ac29890200046646 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 46
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 5
CPU 22 BANK 8 TSC 41d1dbef1a6a
MISC ac29890200046141 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 41
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 6
CPU 22 BANK 8 TSC 4a1b1ecef446
MISC ac29890200046a4a ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 4a
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9
Hardware event. This is not a software error.
MCE 7
CPU 22 BANK 8 TSC 527bc27db776
MISC ac29890200040386 ADDR ee2f6e800
TIME 1655770989 Mon Jun 20 19:23:09 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 86
Memory DIMM ID of error: 0
Memory channel ID of error: 1

Re: MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Ultima
Hey Larry,

 It is possible it's the motherboard itself, but it's rare. The way I
would determine this is to swap the DIMM module with another
populated slot on the motherboard and see if the error migrated
to the new slot or not. Also, this error doesn't necessarily mean
there is a problem that needs to be addressed. If you have been
running the system for many months and you see ECC errors a
handful of times, it can probably be safely ignored.

Best regards,
Richard Gallamore

On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman  wrote:

> I've gotten a BUNCH of these on my TrueNAS server.  I've replaced this
> DIMM a couple of times, and still the MCE's continue.
> Is it possible it's Motherboard slot issue?
>
> Hardware event. This is not a software error.
> MCE 8
> CPU 22 BANK 8 TSC 5aa4ecdd795a
> MISC ac29890200046646 ADDR ee2f6e800
> TIME 1655762472 Mon Jun 20 17:01:12 2022
> MCG status:
> Memory read ECC error
> Memory corrected error count (CORE_ERR_CNT): 1
> Memory transaction Tracker ID (RTId): 46
> Memory DIMM ID of error: 0
> Memory channel ID of error: 1
> Memory ECC syndrome: ac298902
> STATUS 8c41009f MCGSTATUS 0
> MCGCAP 1c09 APICID 34 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 44 Step 2
> DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> Device Locator: P2-DIMM2C
> Bank Locator: BANK14
> Manufacturer: Hyundai
> Serial Number: 40F3C20F
> Asset Tag:
> Part Number: HMT151R7BFR4C-H9
>
>
>
> --
> Larry Rosenman http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
>
>


MCE: Does this look possibly like a slot issue?

2022-06-20 Thread Larry Rosenman
I've gotten a BUNCH of these on my TrueNAS server.  I've replaced this 
DIMM a couple of times, and still the MCE's continue.

Is it possible it's Motherboard slot issue?

Hardware event. This is not a software error.
MCE 8
CPU 22 BANK 8 TSC 5aa4ecdd795a
MISC ac29890200046646 ADDR ee2f6e800
TIME 1655762472 Mon Jun 20 17:01:12 2022
MCG status:
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 46
Memory DIMM ID of error: 0
Memory channel ID of error: 1
Memory ECC syndrome: ac298902
STATUS 8c41009f MCGSTATUS 0
MCGCAP 1c09 APICID 34 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44 Step 2
DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
Device Locator: P2-DIMM2C
Bank Locator: BANK14
Manufacturer: Hyundai
Serial Number: 40F3C20F
Asset Tag:
Part Number: HMT151R7BFR4C-H9



--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106