L2 cache errors???

2015-07-28 Thread Willem Jan Withagen

Hi,

Are these what I think they are?

Errors in the CPU L2 cache?

/var/log/messages:Jul 24 13:14:40 box kernel: MCA: Bank 3, Status 
0x902000120120100e
/var/log/messages:Jul 24 13:14:40 box kernel: MCA: Global Cap 
0x0806, Status 0x
/var/log/messages:Jul 24 13:14:40 box kernel: MCA: Vendor 
GenuineIntel, ID 0x10676, APIC ID 2

/var/log/messages:Jul 24 13:14:40 box kernel: MCA: CPU 2 COR L2 memory error
/var/log/messages:Jul 28 19:12:42 box kernel: MCA: Bank 3, Status 
0x90270220100e
/var/log/messages:Jul 28 19:12:42 box kernel: MCA: Global Cap 
0x0806, Status 0x
/var/log/messages:Jul 28 19:12:42 box kernel: MCA: Vendor 
GenuineIntel, ID 0x10676, APIC ID 0

/var/log/messages:Jul 28 19:12:42 box kernel: MCA: CPU 0 COR L2 memory error

Are the ECC corrected?
Or is error really data kaput?

--WjW
___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org


Re: L2 cache errors???

2015-07-28 Thread Josh Paetzel


On 07/28/2015 13:40, Willem Jan Withagen wrote:
 On 28/07/2015 19:48, Mike Tancsa wrote:
 On 7/28/2015 1:16 PM, Willem Jan Withagen wrote:
 Hi,

 Are these what I think they are?
 Errors in the CPU L2 cache?

 Are the ECC corrected?
 Or is error really data kaput?



 Could be. There is also an erratum issue that triggers these errors on
 certain CPUs when running software like virtualbox.  It was fixed in
 RELENG_10 some time ago. What are you running ?


 https://svnweb.freebsd.org/base?view=revisionrevision=269052

 has some details.
 
 'mmm,
 Not running Haswell stuff, but rather older hardware.
 
 Looked in older logfiles, and there are a few more...
 All with the same data, except that it is detected on different CPUs
 
 And it occurs when running:
   mbuffer -4 -m 1000M -I  | \
 zfs receive -F -d -v zfs
 to receive a full backup from my fileserver.
 
 --WjW
 

You can tell ECC corrected the error because on FreeBSD if ECC can't fix
the error the system will panic.  Other systems (Solaris and HP-UX being
the two I have direct experience with) can detach subsystems that have
sustained uncorrectable errors in some cases. (Yes, even CPUs!)

If a system is generating hundreds or thousands of MCAs a minute you are
dealing with a hardware issue.

If you are getting spurious MCAs to the tune of a few a day there's
nothing abnormal or broken there it's just the system doing what it's
supposed to.

Given the amount of data that flies around inside modern computers I'm
surprised there aren't more MCAs than there are in most systems.


-- 
FreeBSD - The Power To Serve.
___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org


Re: L2 cache errors???

2015-07-28 Thread Mike Tancsa
On 7/28/2015 1:16 PM, Willem Jan Withagen wrote:
 Hi,
 
 Are these what I think they are?
 Errors in the CPU L2 cache?
 
 Are the ECC corrected?
 Or is error really data kaput?
 


Could be. There is also an erratum issue that triggers these errors on
certain CPUs when running software like virtualbox.  It was fixed in
RELENG_10 some time ago. What are you running ?


https://svnweb.freebsd.org/base?view=revisionrevision=269052

has some details.

---Mike


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org


Re: L2 cache errors???

2015-07-28 Thread Willem Jan Withagen
On 28/07/2015 19:48, Mike Tancsa wrote:
 On 7/28/2015 1:16 PM, Willem Jan Withagen wrote:
 Hi,

 Are these what I think they are?
 Errors in the CPU L2 cache?

 Are the ECC corrected?
 Or is error really data kaput?

 
 
 Could be. There is also an erratum issue that triggers these errors on
 certain CPUs when running software like virtualbox.  It was fixed in
 RELENG_10 some time ago. What are you running ?
 
 
 https://svnweb.freebsd.org/base?view=revisionrevision=269052
 
 has some details.

'mmm,
Not running Haswell stuff, but rather older hardware.

Looked in older logfiles, and there are a few more...
All with the same data, except that it is detected on different CPUs

And it occurs when running:
mbuffer -4 -m 1000M -I  | \
zfs receive -F -d -v zfs
to receive a full backup from my fileserver.

--WjW

No tweeked settings, neither is the CPU overheated.
System consumes about 200W, and has a supermicro 450W supply

Running 10.2-BETA2 on a
CPU: Intel(R) Core(TM)2 Extreme CPU X9650  @ 3.00GHz (3005.62-MHz
K8-class CPU)
  Origin=GenuineIntel  Id=0x10676  Family=0x6  Model=0x17  Stepping=6

Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE

Features2=0x8e3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  VT-x: Basic Features=0x5a0800SMM,INS/OUTS
Pin-Based Controls=0x3fExtINT,NMI,VNMI
Primary Processor
Controls=0xf7f9fffeINTWIN,TSCOff,HLT,INVLPG,MWAIT,RDPMC,RDTSC,CR3-LD,CR3-ST,CR8-LD,CR8-ST,TPR,NMIWIN,MOV-DR,IO,IOmap,MSRmap,MONITOR,PAUSE
Secondary Processor Controls=0x41APIC,WBINVD
Exit Controls=0x5a0800PAT-LD,EFER-SV,PTMR-SV
Entry Controls=0x5a0800
  TSC: P-state invariant, performance statistics
Instruction TLB: 2M pages, 4-way, 8 entries or 4M pages, 4-way, 4 entries
Instruction TLB: 4 KB Pages, 4-way set associative, 128 entries
64-Byte prefetching
Data TLB0: 4 KByte pages, 4-way associative, 16 entries
Data TLB0: 4 MByte pages, 4-way set associative, 16 entries
2nd-level cache: 6MByte, 24-way set associative, 64 byte line size
1st-level instruction cache: 32 KB, 8-way set associative, 64 byte line size
Data TLB1: 4 KByte pages, 4-way associative, 256 entries
1st-level data cache: 32 KB, 8-way set associative, 64 byte line size
L2 cache: 6144 kbytes, 16-way associative, 64 bytes/line
real memory  = 7516192768 (7168 MB)

Motherboard:
Base Board Information
Manufacturer: ASUSTeK Computer INC.
Product Name: P5Q-E
Version: Rev 1.xx
Serial Number: MS1C87B16302305


___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org


Re: L2 cache errors???

2015-07-28 Thread Willem Jan Withagen
On 28/07/2015 21:04, Josh Paetzel wrote:
 
 
 On 07/28/2015 13:40, Willem Jan Withagen wrote:
 On 28/07/2015 19:48, Mike Tancsa wrote:
 On 7/28/2015 1:16 PM, Willem Jan Withagen wrote:
 Hi,

 Are these what I think they are?
 Errors in the CPU L2 cache?

 Are the ECC corrected?
 Or is error really data kaput?



 Could be. There is also an erratum issue that triggers these errors on
 certain CPUs when running software like virtualbox.  It was fixed in
 RELENG_10 some time ago. What are you running ?


 https://svnweb.freebsd.org/base?view=revisionrevision=269052

 has some details.

 'mmm,
 Not running Haswell stuff, but rather older hardware.

 Looked in older logfiles, and there are a few more...
 All with the same data, except that it is detected on different CPUs

 And it occurs when running:
  mbuffer -4 -m 1000M -I  | \
 zfs receive -F -d -v zfs
 to receive a full backup from my fileserver.

 --WjW

 
 You can tell ECC corrected the error because on FreeBSD if ECC can't fix
 the error the system will panic.  Other systems (Solaris and HP-UX being
 the two I have direct experience with) can detach subsystems that have
 sustained uncorrectable errors in some cases. (Yes, even CPUs!)

Offlining CPus, cool.
No the system does not panic, but I do get reports from 'zfs receive'
that the datastream is invalid. And it then aborts.
So I'll have to do more digging, to see what is up.

 If a system is generating hundreds or thousands of MCAs a minute you are
 dealing with a hardware issue.
 
 If you are getting spurious MCAs to the tune of a few a day there's
 nothing abnormal or broken there it's just the system doing what it's
 supposed to.

Never had them before, and now about 6 this week.
Let alone in L2 cache.
So it got me worried.

 Given the amount of data that flies around inside modern computers I'm
 surprised there aren't more MCAs than there are in most systems.

Perhaps not enough alpha particles hitting the cells. :)

Thanx,
--WjW

___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org


Disk Controllers - ciss driver mismatch of supported devices

2015-07-28 Thread Dmitriy Kulikov
Hello!

Sorry, I found a mismatch between the lists of supported controllers of driver 
ciss. On the website of the developers stated that ciss supports new disk 
controlers of servers HP Gen 9 (H240ar, P440ar, etc.). But in the FreeBSD 10.1 
documentation their support is not mentioned. Does FreeBSD 10 support new disk 
controlers (H240ar, P440ar) of servers HP Gen 9?
http://cciss.sourceforge.net/
https://www.freebsd.org/releases/10.1R/hardware.html

Of course, I also found incomplete compliance supported network controllers HP.
But that can be partly understood through matching chips.


Best regards,
Dmitriy
___
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org