Re: [gentoo-user] MCE error
On 28-03-2015 09:34 PM, Frank Steinmetzger wrote: On Sat, Mar 28, 2015 at 07:48:48PM -0300, Sebas Pedersen wrote: bios update/microcode update. A google search suggests that you have run into an errata. Oh OK, thank you. Must have miss that in the search. So you are saying that the error comes from a bios errata (and don't know what microdode is), and the fix is to update bios? An “errate” is what Intel calls an error or defect in its hardware, as far as I know. Microcode is sort of a “firmware” running in a CPU. For example, the TSX feature in Haswells (which was one of the reasons why I chose my particular CPU in the first place, grrr) was found to have a bug, so Intel produced a microcode update that would simply disable the relevant set of instructions. no, possibly from a CPU errata and a bios update might bring in the microcode update that works around that. I see, thanks for clarifying that. So looks like not too many options, either try to update the bios and/or replace the CPU. Not necessarily. The first search hit (“linux update microcode”) brought me to ¹, another to ². The latter led me to finding sys-apps/microcode-ctl, which might do what you need. ¹ https://wiki.archlinux.org/index.php/Microcode ² http://www.linux-mag.com/id/723/ Thank you very much for the explanations. Very clear indeed. I'm gonna work around with this microcode stuff and see if that helps. I appreciated you reply!
Re: [gentoo-user] MCE error
On 28-03-2015 10:13 PM, Fernando Rodriguez wrote: On Saturday, March 28, 2015 7:48:48 PM Sebas Pedersen wrote: I see, thanks for clarifying that. So looks like not too many options, either try to update the bios and/or replace the CPU. I really appreciated you replys and time. Thanks!, Sebas There's a few things you can try. 1. Go in the bios menu and reset it to safe defaults or similar setting. If that don't work play with the settings, especially memory settings (try lowering the frequency). 2. If the motherboard is dirty, clean it real good. This may sound crazy but I've had success spray washing it with a hose and drying it on a warm oven. 3. Flash the bios with the latest version. 4. If you have a soldering iron and junk parts laying around replace any blown capacitors on the board. The first one I already try with no luck. Now you mention it, it is a little dusty the motherboard... and looking carefully I think it could be some capacitors may are some blow (but not for sure). Thank you very much for the tips, I'll try them all!
Re: [gentoo-user] MCE error
On 29-03-2015 12:45 PM, Mick wrote: On Sunday 29 Mar 2015 16:42:10 Sebas Pedersen wrote: On 28-03-2015 08:50 PM, Mick wrote: On Saturday 28 Mar 2015 22:48:48 Sebas Pedersen wrote: On 28-03-2015 07:37 PM, Volker Armin Hemmann wrote: Am 28.03.2015 um 23:00 schrieb Sebas Pedersen: On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote: Am 28.03.2015 um 14:58 schrieb Sebas Pedersen: Hi guys, From a few days ago I am experimenting an MCE error. Sometimes I turn on the computer and at some point while booting the kernel (after the grub menu) just freezes and puts this: CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC f5acc9180 PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0 the number for TSC may vary, but the b2070f0f it's always the same (at least for now). The error message suggest to parse the above error with mcelog. I did that and the result was: Hardware event. This is not a software error. CPU 0 4 northbridge TSC f5acc9180 TIME 1427486735 Fri Mar 27 17:05:35 2015 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 CPUID Vendor AMD Family 15 Model 44 SOCKET 0 APIC 0 microcode 0 The error suggest it's a hardware problem. I replace de RAM with no luck. Same error keeps happening. Any suggestion for identifying the problem or how to procede? Many thanks in advance! Sebas bios update/microcode update. A google search suggests that you have run into an errata. Oh OK, thank you. Must have miss that in the search. So you are saying that the error comes from a bios errata (and don't know what microdode is), and the fix is to update bios? no, possibly from a CPU errata and a bios update might bring in the microcode update that works around that. I see, thanks for clarifying that. So looks like not too many options, either try to update the bios and/or replace the CPU. I really appreciated you replys and time. Thanks!, Sebas There's 'CONFIG_MICROCODE=y' and friends in the kernel which along with sys- apps/microcode-ctl will load what ever is the latest Intel/AMD CPU code (firmware) to patch any bugs with instructions that the CPU manufacturers have discovered. That's nice. I'm gonna compile the kernel and see what happends. Many thanks! Don't forget to enable the relevant module for your type of CPU. You're right. Thanks for the reminder! Best Regards, Sebas
Re: [gentoo-user] MCE error
On 28-03-2015 08:50 PM, Mick wrote: On Saturday 28 Mar 2015 22:48:48 Sebas Pedersen wrote: On 28-03-2015 07:37 PM, Volker Armin Hemmann wrote: Am 28.03.2015 um 23:00 schrieb Sebas Pedersen: On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote: Am 28.03.2015 um 14:58 schrieb Sebas Pedersen: Hi guys, From a few days ago I am experimenting an MCE error. Sometimes I turn on the computer and at some point while booting the kernel (after the grub menu) just freezes and puts this: CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC f5acc9180 PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0 the number for TSC may vary, but the b2070f0f it's always the same (at least for now). The error message suggest to parse the above error with mcelog. I did that and the result was: Hardware event. This is not a software error. CPU 0 4 northbridge TSC f5acc9180 TIME 1427486735 Fri Mar 27 17:05:35 2015 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 CPUID Vendor AMD Family 15 Model 44 SOCKET 0 APIC 0 microcode 0 The error suggest it's a hardware problem. I replace de RAM with no luck. Same error keeps happening. Any suggestion for identifying the problem or how to procede? Many thanks in advance! Sebas bios update/microcode update. A google search suggests that you have run into an errata. Oh OK, thank you. Must have miss that in the search. So you are saying that the error comes from a bios errata (and don't know what microdode is), and the fix is to update bios? no, possibly from a CPU errata and a bios update might bring in the microcode update that works around that. I see, thanks for clarifying that. So looks like not too many options, either try to update the bios and/or replace the CPU. I really appreciated you replys and time. Thanks!, Sebas There's 'CONFIG_MICROCODE=y' and friends in the kernel which along with sys- apps/microcode-ctl will load what ever is the latest Intel/AMD CPU code (firmware) to patch any bugs with instructions that the CPU manufacturers have discovered. That's nice. I'm gonna compile the kernel and see what happends. Many thanks!
Re: [gentoo-user] MCE error
On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote: Am 28.03.2015 um 14:58 schrieb Sebas Pedersen: Hi guys, From a few days ago I am experimenting an MCE error. Sometimes I turn on the computer and at some point while booting the kernel (after the grub menu) just freezes and puts this: CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC f5acc9180 PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0 the number for TSC may vary, but the b2070f0f it's always the same (at least for now). The error message suggest to parse the above error with mcelog. I did that and the result was: Hardware event. This is not a software error. CPU 0 4 northbridge TSC f5acc9180 TIME 1427486735 Fri Mar 27 17:05:35 2015 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 CPUID Vendor AMD Family 15 Model 44 SOCKET 0 APIC 0 microcode 0 The error suggest it's a hardware problem. I replace de RAM with no luck. Same error keeps happening. Any suggestion for identifying the problem or how to procede? Many thanks in advance! Sebas bios update/microcode update. A google search suggests that you have run into an errata. Oh OK, thank you. Must have miss that in the search. So you are saying that the error comes from a bios errata (and don't know what microdode is), and the fix is to update bios? I gonna look at asus site see if I can update the bios then. Sorry for the silly questions. Many thanks for the reply! Cheers, Sebas
Re: [gentoo-user] MCE error
On 28-03-2015 07:37 PM, Volker Armin Hemmann wrote: Am 28.03.2015 um 23:00 schrieb Sebas Pedersen: On 28-03-2015 06:45 PM, Volker Armin Hemmann wrote: Am 28.03.2015 um 14:58 schrieb Sebas Pedersen: Hi guys, From a few days ago I am experimenting an MCE error. Sometimes I turn on the computer and at some point while booting the kernel (after the grub menu) just freezes and puts this: CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC f5acc9180 PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0 the number for TSC may vary, but the b2070f0f it's always the same (at least for now). The error message suggest to parse the above error with mcelog. I did that and the result was: Hardware event. This is not a software error. CPU 0 4 northbridge TSC f5acc9180 TIME 1427486735 Fri Mar 27 17:05:35 2015 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 CPUID Vendor AMD Family 15 Model 44 SOCKET 0 APIC 0 microcode 0 The error suggest it's a hardware problem. I replace de RAM with no luck. Same error keeps happening. Any suggestion for identifying the problem or how to procede? Many thanks in advance! Sebas bios update/microcode update. A google search suggests that you have run into an errata. Oh OK, thank you. Must have miss that in the search. So you are saying that the error comes from a bios errata (and don't know what microdode is), and the fix is to update bios? no, possibly from a CPU errata and a bios update might bring in the microcode update that works around that. I see, thanks for clarifying that. So looks like not too many options, either try to update the bios and/or replace the CPU. I really appreciated you replys and time. Thanks!, Sebas
[gentoo-user] MCE error
Hi guys, From a few days ago I am experimenting an MCE error. Sometimes I turn on the computer and at some point while booting the kernel (after the grub menu) just freezes and puts this: CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC f5acc9180 PROCESSOR 2:20fc2 TIME 1427486735 SOCKET 0 APIC 0 microcode 0 the number for TSC may vary, but the b2070f0f it's always the same (at least for now). The error message suggest to parse the above error with mcelog. I did that and the result was: Hardware event. This is not a software error. CPU 0 4 northbridge TSC f5acc9180 TIME 1427486735 Fri Mar 27 17:05:35 2015 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 CPUID Vendor AMD Family 15 Model 44 SOCKET 0 APIC 0 microcode 0 The error suggest it's a hardware problem. I replace de RAM with no luck. Same error keeps happening. Any suggestion for identifying the problem or how to procede? Many thanks in advance! Sebas