Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-28 Thread John Baldwin
On Friday, December 24, 2010 3:47:16 am Matthew D. Fuller wrote:
> On Wed, Dec 22, 2010 at 09:57:26AM -0500 I heard the voice of
> John Baldwin, and lo! it spake thus:
> > 
> > You are getting corrected ECC errors in your RAM.
> 
> Actually, don't
> 
> > CPU 0 0 data cache 
> > ADDR 236493c0 
> >   Data cache ECC error (syndrome 1c)
> 
> > CPU 0 1 instruction cache 
> > ADDR 2a1c9440 
> >   Instruction cache ECC error
> 
> > CPU 0 2 bus unit 
> >   L2 cache ECC error
> 
> > CPU 1 0 data cache 
> > ADDR 23649640 
> >   Data cache ECC error (syndrome 1c)
> 
> > CPU 1 1 instruction cache 
> > ADDR 2a1c9440 
> >   Instruction cache ECC error
> 
> > CPU 1 2 bus unit 
> >   L2 cache ECC error
> 
> suggest CPU cache, not RAM?
> 
> (that's actually a question; I don't know, but that's what a naive
> reading suggests...)

Hmm, I don't know for certain.  My interpretation is that the CPU errors were 
just secondary errors from a memory error like this one that was in the middle 
of his reported errors.  It was also only reported on CPU 0 and not CPU 1:

STATUS d0004863 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is NOT a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge 
MISC e00d0fff ADDR 2cac9678 
  Northbridge RAM ECC error
  ECC syndrome = 1c
   bit33 = err cpu1
   bit46 = corrected ecc error
   bit59 = misc error valid
   bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
 generic read mem transaction
 memory access, level generic'

On Intel systems (which I am much more familiar with as far as machine checks 
go), corrected ECC errors did not result in additional events in the CPU 
caches themselves, but I don't know if AMD is different in this regard.  It 
could be that both CPUs and a DIMM are failing, but replacing a DIMM is 
cheaper and simpler and you can always replace the CPUs later if CPU errors 
continue.  Of course, I can't tell you which DIMM to replace from these 
messages, but in this case since they are so easily reproducible, you could 
probably swap them out one at a time to test.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-25 Thread Sergey Kandaurov
On 25 December 2010 07:48, Carl Johnson  wrote:
> Alan Cox  writes:
>
>> On Fri, Dec 24, 2010 at 5:08 PM, Carl Johnson  wrote:
>>
>>> Alan Cox  writes:
>>>
>>> > 2010/12/23 Dan Langille 
>>> >
>>> >> On 12/22/2010 9:57 AM, John Baldwin wrote:
>>> >>
>>> >>> On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
>>> >>>
>>>  Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
>>>  Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
>>>  Status 0x
>>>  Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
>>>  APIC ID 0
>>>  Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD
>>>  Memory
>>>  Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0
>>> 
>>> >>>
>>> >>> You are getting corrected ECC errors in your RAM.  You see them once an
>>> >>> hour
>>> >>> because we poll the machine check registers once an hour.  If this
>>> happens
>>> >>> constantly you might have a DIMM that is dying?
>>> >>>
>>> >>
>>> >> John:
>>> >>
>>> >> I take it these ECC errors *may* have been happening for some time. What
>>> >> has changed is the OS now polls for the errors and reports them.
>>> >>
>>> >>
>>> > Yes, we enabled MCA by default in 8.1-RELEASE.
>>>
>>> Is there some reason that it is only available for i386 and not for
>>> amd64?  Linux has something called mcelog, for machine check errors,
>>> which sounds similar and is available for amd64.
>>>
>>>
>> Perhaps I'm misunderstanding your question, but our MCA driver is supported
>> and enabled by default on both i386 and amd64.
>
> Thanks, it appears that I misunderstood.  I ran whereis and found
> /usr/src/sbin/mca and didn't find it on my amd64 system.  I do see it in
> my sysctl listing now that I look there.
>

I guess that's designed for ia64 only
(at least there's no hw.mca.first on other arches).

-- 
wbr,
pluknet
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-24 Thread Carl Johnson
Jeremy Chadwick  writes:

> On Fri, Dec 24, 2010 at 03:08:48PM -0800, Carl Johnson wrote:
>> Alan Cox  writes:
>> 
>> > 2010/12/23 Dan Langille 
>> >
>> >> On 12/22/2010 9:57 AM, John Baldwin wrote:
>> >>
>> >>> On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
>> >>>
>>  Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
>>  Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
>>  Status 0x
>>  Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
>>  APIC ID 0
>>  Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD
>>  Memory
>>  Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0
>> 
>> >>>
>> >>> You are getting corrected ECC errors in your RAM.  You see them once an
>> >>> hour
>> >>> because we poll the machine check registers once an hour.  If this 
>> >>> happens
>> >>> constantly you might have a DIMM that is dying?
>> >>>
>> >>
>> >> John:
>> >>
>> >> I take it these ECC errors *may* have been happening for some time. What
>> >> has changed is the OS now polls for the errors and reports them.
>> >>
>> >>
>> > Yes, we enabled MCA by default in 8.1-RELEASE.
>> 
>> Is there some reason that it is only available for i386 and not for
>> amd64?  Linux has something called mcelog, for machine check errors,
>> which sounds similar and is available for amd64.
>
> You mean like what John used in his earlier post on this thread?  :-)
>
> http://lists.freebsd.org/pipermail/freebsd-stable/2010-December/060705.html

Oops!  Yes, I missed that when I read it.

> If you're looking for it for FreeBSD, it's available below as a patch to
> the original (I believe):
>
> http://www.FreeBSD.org/~jhb/mcelog/

Thanks for the link.

-- 
Carl Johnsonca...@peak.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-24 Thread Carl Johnson
Alan Cox  writes:

> On Fri, Dec 24, 2010 at 5:08 PM, Carl Johnson  wrote:
>
>> Alan Cox  writes:
>>
>> > 2010/12/23 Dan Langille 
>> >
>> >> On 12/22/2010 9:57 AM, John Baldwin wrote:
>> >>
>> >>> On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
>> >>>
>>  Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
>>  Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
>>  Status 0x
>>  Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
>>  APIC ID 0
>>  Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD
>>  Memory
>>  Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0
>> 
>> >>>
>> >>> You are getting corrected ECC errors in your RAM.  You see them once an
>> >>> hour
>> >>> because we poll the machine check registers once an hour.  If this
>> happens
>> >>> constantly you might have a DIMM that is dying?
>> >>>
>> >>
>> >> John:
>> >>
>> >> I take it these ECC errors *may* have been happening for some time. What
>> >> has changed is the OS now polls for the errors and reports them.
>> >>
>> >>
>> > Yes, we enabled MCA by default in 8.1-RELEASE.
>>
>> Is there some reason that it is only available for i386 and not for
>> amd64?  Linux has something called mcelog, for machine check errors,
>> which sounds similar and is available for amd64.
>>
>>
> Perhaps I'm misunderstanding your question, but our MCA driver is supported
> and enabled by default on both i386 and amd64.

Thanks, it appears that I misunderstood.  I ran whereis and found
/usr/src/sbin/mca and didn't find it on my amd64 system.  I do see it in
my sysctl listing now that I look there.

-- 
Carl Johnsonca...@peak.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-24 Thread Jeremy Chadwick
On Fri, Dec 24, 2010 at 03:08:48PM -0800, Carl Johnson wrote:
> Alan Cox  writes:
> 
> > 2010/12/23 Dan Langille 
> >
> >> On 12/22/2010 9:57 AM, John Baldwin wrote:
> >>
> >>> On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
> >>>
>  Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
>  Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
>  Status 0x
>  Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
>  APIC ID 0
>  Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD
>  Memory
>  Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0
> 
> >>>
> >>> You are getting corrected ECC errors in your RAM.  You see them once an
> >>> hour
> >>> because we poll the machine check registers once an hour.  If this happens
> >>> constantly you might have a DIMM that is dying?
> >>>
> >>
> >> John:
> >>
> >> I take it these ECC errors *may* have been happening for some time. What
> >> has changed is the OS now polls for the errors and reports them.
> >>
> >>
> > Yes, we enabled MCA by default in 8.1-RELEASE.
> 
> Is there some reason that it is only available for i386 and not for
> amd64?  Linux has something called mcelog, for machine check errors,
> which sounds similar and is available for amd64.

You mean like what John used in his earlier post on this thread?  :-)

http://lists.freebsd.org/pipermail/freebsd-stable/2010-December/060705.html

If you're looking for it for FreeBSD, it's available below as a patch to
the original (I believe):

http://www.FreeBSD.org/~jhb/mcelog/

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-24 Thread Alan Cox
On Fri, Dec 24, 2010 at 5:08 PM, Carl Johnson  wrote:

> Alan Cox  writes:
>
> > 2010/12/23 Dan Langille 
> >
> >> On 12/22/2010 9:57 AM, John Baldwin wrote:
> >>
> >>> On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
> >>>
>  Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
>  Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
>  Status 0x
>  Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
>  APIC ID 0
>  Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD
>  Memory
>  Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0
> 
> >>>
> >>> You are getting corrected ECC errors in your RAM.  You see them once an
> >>> hour
> >>> because we poll the machine check registers once an hour.  If this
> happens
> >>> constantly you might have a DIMM that is dying?
> >>>
> >>
> >> John:
> >>
> >> I take it these ECC errors *may* have been happening for some time. What
> >> has changed is the OS now polls for the errors and reports them.
> >>
> >>
> > Yes, we enabled MCA by default in 8.1-RELEASE.
>
> Is there some reason that it is only available for i386 and not for
> amd64?  Linux has something called mcelog, for machine check errors,
> which sounds similar and is available for amd64.
>
>
Perhaps I'm misunderstanding your question, but our MCA driver is supported
and enabled by default on both i386 and amd64.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-24 Thread Carl Johnson
Alan Cox  writes:

> 2010/12/23 Dan Langille 
>
>> On 12/22/2010 9:57 AM, John Baldwin wrote:
>>
>>> On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
>>>
 Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
 Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
 Status 0x
 Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
 APIC ID 0
 Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD
 Memory
 Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0

>>>
>>> You are getting corrected ECC errors in your RAM.  You see them once an
>>> hour
>>> because we poll the machine check registers once an hour.  If this happens
>>> constantly you might have a DIMM that is dying?
>>>
>>
>> John:
>>
>> I take it these ECC errors *may* have been happening for some time. What
>> has changed is the OS now polls for the errors and reports them.
>>
>>
> Yes, we enabled MCA by default in 8.1-RELEASE.

Is there some reason that it is only available for i386 and not for
amd64?  Linux has something called mcelog, for machine check errors,
which sounds similar and is available for amd64.

-- 
Carl Johnsonca...@peak.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-24 Thread Alan Cox
2010/12/23 Dan Langille 

> On 12/22/2010 9:57 AM, John Baldwin wrote:
>
>> On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
>>
>>> Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
>>> Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
>>> Status 0x
>>> Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
>>> APIC ID 0
>>> Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD
>>> Memory
>>> Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0
>>>
>>
>> You are getting corrected ECC errors in your RAM.  You see them once an
>> hour
>> because we poll the machine check registers once an hour.  If this happens
>> constantly you might have a DIMM that is dying?
>>
>
> John:
>
> I take it these ECC errors *may* have been happening for some time. What
> has changed is the OS now polls for the errors and reports them.
>
>
Yes, we enabled MCA by default in 8.1-RELEASE.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-24 Thread Matthew D. Fuller
On Wed, Dec 22, 2010 at 09:57:26AM -0500 I heard the voice of
John Baldwin, and lo! it spake thus:
> 
> You are getting corrected ECC errors in your RAM.

Actually, don't

> CPU 0 0 data cache 
> ADDR 236493c0 
>   Data cache ECC error (syndrome 1c)

> CPU 0 1 instruction cache 
> ADDR 2a1c9440 
>   Instruction cache ECC error

> CPU 0 2 bus unit 
>   L2 cache ECC error

> CPU 1 0 data cache 
> ADDR 23649640 
>   Data cache ECC error (syndrome 1c)

> CPU 1 1 instruction cache 
> ADDR 2a1c9440 
>   Instruction cache ECC error

> CPU 1 2 bus unit 
>   L2 cache ECC error

suggest CPU cache, not RAM?

(that's actually a question; I don't know, but that's what a naive
reading suggests...)


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-23 Thread Dan Langille

On 12/22/2010 9:57 AM, John Baldwin wrote:

On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:

Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
Status 0x
Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
APIC ID 0
Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD Memory
Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0


You are getting corrected ECC errors in your RAM.  You see them once an hour
because we poll the machine check registers once an hour.  If this happens
constantly you might have a DIMM that is dying?


John:

I take it these ECC errors *may* have been happening for some time. 
What has changed is the OS now polls for the errors and reports them.


--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-23 Thread Miroslav Lachman

John Baldwin wrote:

On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:

Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
Status 0x
Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33,
APIC ID 0
Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD Memory
Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0


You are getting corrected ECC errors in your RAM.  You see them once an hour
because we poll the machine check registers once an hour.  If this happens
constantly you might have a DIMM that is dying?


Yes, it happens constantly. Does Bank in this context means DIMM socket 
or anything else? If it is DIMM socket, then it means all modules are 
dying at the same time :(


Thank you for mcelog output. BTW do you have any time plan for releasing 
port of mcelog?


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-22 Thread John Baldwin
On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
> Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
> Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105, 
> Status 0x
> Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33, 
> APIC ID 0
> Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD Memory
> Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0

You are getting corrected ECC errors in your RAM.  You see them once an hour
because we poll the machine check registers once an hour.  If this happens
constantly you might have a DIMM that is dying?

% ~/mcelog --ascii < foo.txt 
mcelog: Cannot open /dev/mem for DMI decoding: Permission denied
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 0 data cache 
ADDR 236493c0 
  Data cache ECC error (syndrome 1c)
   bit46 = corrected ecc error
   bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
 data read mem transaction
 memory access, level generic'
STATUS d40e4833 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 1 instruction cache 
ADDR 2a1c9440 
  Instruction cache ECC error
   bit46 = corrected ecc error
   bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
 instruction fetch mem transaction
 memory access, level generic'
STATUS d4004853 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 2 bus unit 
  L2 cache ECC error
  Bus or cache array error
   bit46 = corrected ecc error
   bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
 prefetch mem transaction
 memory access, level generic'
STATUS d0004863 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge 
MISC e00d0fff ADDR 2cac9678 
  Northbridge RAM ECC error
  ECC syndrome = 1c
   bit33 = err cpu1
   bit46 = corrected ecc error
   bit59 = misc error valid
   bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
 generic read mem transaction
 memory access, level generic'
STATUS dc0e40020813 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 0 data cache 
ADDR 23649640 
  Data cache ECC error (syndrome 1c)
   bit46 = corrected ecc error
   bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
 data read mem transaction
 memory access, level generic'
STATUS d40e4833 MCGSTATUS 0
MCGCAP 105 APICID 1 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 1 instruction cache 
ADDR 2a1c9440 
  Instruction cache ECC error
   bit46 = corrected ecc error
   bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
 instruction fetch mem transaction
 memory access, level generic'
STATUS d4004853 MCGSTATUS 0
MCGCAP 105 APICID 1 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 67
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 2 bus unit 
  L2 cache ECC error
  Bus or cache array error
   bit46 = corrected ecc error
   bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
 prefetch mem transaction
 memory access, level generic'
STATUS d0004863 MCGSTATUS 0
MCGCAP 105 APICID 1 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 67


-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


MCA messages after upgrade to 8.2-BEAT1

2010-12-22 Thread Miroslav Lachman

Hi,
the machine in question was upgraded from 7.3 to FreeBSD 8.2-BETA1 i386 
GENERIC
After this upgrade, i got following mesages in /var/log/messages every 
hour. The machine is almost idle (for testing only)


Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105, 
Status 0x
Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33, 
APIC ID 0

Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD Memory
Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0
Dec 21 12:42:26 kavkaz kernel: MCA: Bank 1, Status 0xd4004853
Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105, 
Status 0x
Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33, 
APIC ID 0

Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source IRD Memory
Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x2a1c9440
Dec 21 12:42:26 kavkaz kernel: MCA: Bank 2, Status 0xd0004863
Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105, 
Status 0x
Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33, 
APIC ID 0
Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source PREFETCH 
Memory

Dec 21 12:42:26 kavkaz kernel: MCA: Bank 4, Status 0xdc0e40020813
Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105, 
Status 0x
Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33, 
APIC ID 0

Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source RD Memory
Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x2cac9678
Dec 21 12:42:26 kavkaz kernel: MCA: Misc 0xe00d0fff
Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105, 
Status 0x
Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33, 
APIC ID 1

Dec 21 12:42:26 kavkaz kernel: MCA: CPU 1 COR OVER BUSLG Source DRD Memory
Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x23649640
Dec 21 12:42:26 kavkaz kernel: MCA: Bank 1, Status 0xd4004853
Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105, 
Status 0x
Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33, 
APIC ID 1

Dec 21 12:42:26 kavkaz kernel: MCA: CPU 1 COR OVER BUSLG Source IRD Memory
Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x2a1c9440
Dec 21 12:42:26 kavkaz kernel: MCA: Bank 2, Status 0xd0004863
Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105, 
Status 0x
Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33, 
APIC ID 1
Dec 21 12:42:26 kavkaz kernel: MCA: CPU 1 COR OVER BUSLG Source PREFETCH 
Memory


Can somebody tell me, what these messages are?

Miroslav Lachman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"