Re: [gentoo-user] MCE error

2015-03-29 Thread Sebas Pedersen

On 28-03-2015 09:34 PM, Frank Steinmetzger wrote:

On Sat, Mar 28, 2015 at 07:48:48PM -0300, Sebas Pedersen wrote:


 bios update/microcode update. A google search suggests that you have
 run into an errata.

 Oh OK, thank you. Must have miss that in the search. So you are saying
 that the error comes from a bios errata (and don't know what microdode
 is), and the fix is to update bios?


An “errate” is what Intel calls an error or defect in its hardware, as 
far
as I know. Microcode is sort of a “firmware” running in a CPU. For 
example,
the TSX feature in Haswells (which was one of the reasons why I chose 
my
particular CPU in the first place, grrr) was found to have a bug, so 
Intel
produced a microcode update that would simply disable the relevant set 
of

instructions.


 no, possibly from a CPU errata and a bios update might bring in the
 microcode update that works around that.

I see, thanks for clarifying that. So looks like not too many options,
either try to update the bios and/or replace the CPU.


Not necessarily. The first search hit (“linux update microcode”) 
brought me
to ¹, another to ². The latter led me to finding 
sys-apps/microcode-ctl,

which might do what you need.

¹ https://wiki.archlinux.org/index.php/Microcode
² http://www.linux-mag.com/id/723/


Thank you very much for the explanations. Very clear indeed. I'm gonna 
work around with this microcode stuff and see if that helps.


I appreciated you reply!



Re: [gentoo-user] MCE error

2015-03-29 Thread Sebas Pedersen

On 28-03-2015 10:13 PM, Fernando Rodriguez wrote:

On Saturday, March 28, 2015 7:48:48 PM Sebas Pedersen wrote:

I see, thanks for clarifying that. So looks like not too many options,
either try to update the bios and/or replace the CPU.

I really appreciated you replys and time.

Thanks!,
Sebas



There's a few things you can try.

1. Go in the bios menu and reset it to safe defaults or similar 
setting. If

that don't work play with the settings, especially memory settings (try
lowering the frequency).

2. If the motherboard is dirty, clean it real good. This may sound 
crazy but
I've had success spray washing it with a hose and drying it on a warm 
oven.


3. Flash the bios with the latest version.

4. If you have a soldering iron and junk parts laying around replace 
any blown

capacitors on the board.


The first one I already try with no luck. Now you mention it, it is a 
little dusty the motherboard... and looking carefully I think it could 
be some capacitors may are some blow (but not for sure).


Thank you very much for the tips, I'll try them all!



Re: [gentoo-user] MCE error

2015-03-29 Thread Sebas Pedersen

On 29-03-2015 12:45 PM, Mick wrote:

On Sunday 29 Mar 2015 16:42:10 Sebas Pedersen wrote:

On 28-03-2015 08:50 PM, Mick wrote:
 On Saturday 28 Mar 2015 22:48:48 Sebas Pedersen wrote:
 On 28-03-2015 07:37 PM, Volker Armin Hemmann wrote:
  Am 28.03.2015 um 23:00 schrieb Sebas Pedersen:
  On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote:
  Am 28.03.2015 um 14:58 schrieb Sebas Pedersen:
  Hi guys,
 
  From a few days ago I am experimenting an MCE error.
  Sometimes I turn on the computer and at some point while booting
  the kernel (after the grub menu) just freezes and puts this:
 
  CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f
  TSC f5acc9180
  PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0
 
  the number for TSC may vary, but the b2070f0f it's always
  the
  same (at least for now). The error message suggest to parse the
  above
  error with mcelog. I did that and the result was:
 
  Hardware event. This is not a software error.
  CPU 0 4 northbridge TSC f5acc9180
  TIME 1427486735 Fri Mar 27 17:05:35 2015
 
Northbridge Watchdog error
 
 bit57 = processor context corrupt
 bit61 = error uncorrected
 
bus error 'generic participation, request timed out
 
   generic error mem transaction
   generic access, level generic'
 
  STATUS b2070f0f MCGSTATUS 4
  CPUID Vendor AMD Family 15 Model 44
  SOCKET 0 APIC 0 microcode 0
 
  The error suggest it's a hardware problem. I replace de RAM with no
  luck. Same error keeps happening.
 
  Any suggestion for identifying the problem or how to procede?
 
  Many thanks in advance!
 
  Sebas
 
  bios update/microcode update. A google search suggests that you have
  run
  into an errata.
 
  Oh OK, thank you. Must have miss that in the search. So you are
  saying that the error comes from a bios errata (and don't know what
  microdode is), and the fix is to update bios?
 
  no, possibly from a CPU errata and a bios update might bring in the
  microcode update that works around that.

 I see, thanks for clarifying that. So looks like not too many options,
 either try to update the bios and/or replace the CPU.

 I really appreciated you replys and time.

 Thanks!,
 Sebas

 There's 'CONFIG_MICROCODE=y' and friends in the kernel which along with
 sys-
 apps/microcode-ctl will load what ever is the latest Intel/AMD CPU code
 (firmware) to patch any bugs with instructions that the CPU
 manufacturers have
 discovered.

That's nice. I'm gonna compile the kernel and see what happends.

Many thanks!


Don't forget to enable the relevant module for your type of CPU.


You're right. Thanks for the reminder!

Best Regards,
Sebas



Re: [gentoo-user] MCE error

2015-03-29 Thread Sebas Pedersen

On 28-03-2015 08:50 PM, Mick wrote:

On Saturday 28 Mar 2015 22:48:48 Sebas Pedersen wrote:

On 28-03-2015 07:37 PM, Volker Armin Hemmann wrote:
 Am 28.03.2015 um 23:00 schrieb Sebas Pedersen:
 On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote:
 Am 28.03.2015 um 14:58 schrieb Sebas Pedersen:
 Hi guys,

 From a few days ago I am experimenting an MCE error.
 Sometimes I turn on the computer and at some point while booting the
 kernel (after the grub menu) just freezes and puts this:

 CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f
 TSC f5acc9180
 PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0

 the number for TSC may vary, but the b2070f0f it's always
 the
 same (at least for now). The error message suggest to parse the
 above
 error with mcelog. I did that and the result was:

 Hardware event. This is not a software error.
 CPU 0 4 northbridge TSC f5acc9180
 TIME 1427486735 Fri Mar 27 17:05:35 2015

   Northbridge Watchdog error

bit57 = processor context corrupt
bit61 = error uncorrected

   bus error 'generic participation, request timed out

  generic error mem transaction
  generic access, level generic'

 STATUS b2070f0f MCGSTATUS 4
 CPUID Vendor AMD Family 15 Model 44
 SOCKET 0 APIC 0 microcode 0

 The error suggest it's a hardware problem. I replace de RAM with no
 luck. Same error keeps happening.

 Any suggestion for identifying the problem or how to procede?

 Many thanks in advance!

 Sebas

 bios update/microcode update. A google search suggests that you have
 run
 into an errata.

 Oh OK, thank you. Must have miss that in the search. So you are saying
 that the error comes from a bios errata (and don't know what microdode
 is), and the fix is to update bios?

 no, possibly from a CPU errata and a bios update might bring in the
 microcode update that works around that.

I see, thanks for clarifying that. So looks like not too many options,
either try to update the bios and/or replace the CPU.

I really appreciated you replys and time.

Thanks!,
Sebas


There's 'CONFIG_MICROCODE=y' and friends in the kernel which along with 
sys-

apps/microcode-ctl will load what ever is the latest Intel/AMD CPU code
(firmware) to patch any bugs with instructions that the CPU 
manufacturers have

discovered.


That's nice. I'm gonna compile the kernel and see what happends.

Many thanks!



Re: [gentoo-user] MCE error

2015-03-28 Thread Sebas Pedersen

On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote:

Am 28.03.2015 um 14:58 schrieb Sebas Pedersen:

Hi guys,

From a few days ago I am experimenting an MCE error.
Sometimes I turn on the computer and at some point while booting the
kernel (after the grub menu) just freezes and puts this:

CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f
TSC f5acc9180
PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0

the number for TSC may vary, but the b2070f0f it's always the
same (at least for now). The error message suggest to parse the above
error with mcelog. I did that and the result was:

Hardware event. This is not a software error.
CPU 0 4 northbridge TSC f5acc9180
TIME 1427486735 Fri Mar 27 17:05:35 2015
  Northbridge Watchdog error
   bit57 = processor context corrupt
   bit61 = error uncorrected
  bus error 'generic participation, request timed out
 generic error mem transaction
 generic access, level generic'
STATUS b2070f0f MCGSTATUS 4
CPUID Vendor AMD Family 15 Model 44
SOCKET 0 APIC 0 microcode 0

The error suggest it's a hardware problem. I replace de RAM with no
luck. Same error keeps happening.

Any suggestion for identifying the problem or how to procede?

Many thanks in advance!

Sebas




bios update/microcode update. A google search suggests that you have 
run

into an errata.


Oh OK, thank you. Must have miss that in the search. So you are saying 
that the error comes from a bios errata (and don't know what microdode 
is), and the fix is to update bios?


I gonna look at asus site see if I can update the bios then.

Sorry for the silly questions.

Many thanks for the reply!

Cheers,
Sebas



Re: [gentoo-user] MCE error

2015-03-28 Thread Sebas Pedersen

On 28-03-2015 07:37 PM, Volker Armin Hemmann wrote:

Am 28.03.2015 um 23:00 schrieb Sebas Pedersen:

On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote:

Am 28.03.2015 um 14:58 schrieb Sebas Pedersen:

Hi guys,

From a few days ago I am experimenting an MCE error.
Sometimes I turn on the computer and at some point while booting the
kernel (after the grub menu) just freezes and puts this:

CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f
TSC f5acc9180
PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0

the number for TSC may vary, but the b2070f0f it's always 
the
same (at least for now). The error message suggest to parse the 
above

error with mcelog. I did that and the result was:

Hardware event. This is not a software error.
CPU 0 4 northbridge TSC f5acc9180
TIME 1427486735 Fri Mar 27 17:05:35 2015
  Northbridge Watchdog error
   bit57 = processor context corrupt
   bit61 = error uncorrected
  bus error 'generic participation, request timed out
 generic error mem transaction
 generic access, level generic'
STATUS b2070f0f MCGSTATUS 4
CPUID Vendor AMD Family 15 Model 44
SOCKET 0 APIC 0 microcode 0

The error suggest it's a hardware problem. I replace de RAM with no
luck. Same error keeps happening.

Any suggestion for identifying the problem or how to procede?

Many thanks in advance!

Sebas




bios update/microcode update. A google search suggests that you have 
run

into an errata.


Oh OK, thank you. Must have miss that in the search. So you are saying
that the error comes from a bios errata (and don't know what microdode
is), and the fix is to update bios?


no, possibly from a CPU errata and a bios update might bring in the
microcode update that works around that.


I see, thanks for clarifying that. So looks like not too many options, 
either try to update the bios and/or replace the CPU.


I really appreciated you replys and time.

Thanks!,
Sebas



[gentoo-user] MCE error

2015-03-28 Thread Sebas Pedersen

Hi guys,

From a few days ago I am experimenting an MCE error.
Sometimes I turn on the computer and at some point while booting the 
kernel (after the grub menu) just freezes and puts this:


CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f
TSC f5acc9180
PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0

the number for TSC may vary, but the b2070f0f it's always the 
same (at least for now). The error message suggest to parse the above 
error with mcelog. I did that and the result was:


Hardware event. This is not a software error.
CPU 0 4 northbridge TSC f5acc9180
TIME 1427486735 Fri Mar 27 17:05:35 2015
  Northbridge Watchdog error
   bit57 = processor context corrupt
   bit61 = error uncorrected
  bus error 'generic participation, request timed out
 generic error mem transaction
 generic access, level generic'
STATUS b2070f0f MCGSTATUS 4
CPUID Vendor AMD Family 15 Model 44
SOCKET 0 APIC 0 microcode 0

The error suggest it's a hardware problem. I replace de RAM with no 
luck. Same error keeps happening.


Any suggestion for identifying the problem or how to procede?

Many thanks in advance!

Sebas