Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Hendrik . wrote: So I think there is a problem with this specific CK804 ATA controller causing the MCE... Any clues? Yes, the SATA chip is broken. Probably time to check the known errata on the chip, and if it isn't known, bring nvidia in to debug their silicon. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Hendrik . wrote: Ok, I did actually not copy the coreret code in the mcelog, leaving me some errors about the Northbridge. If I do it again it gives me something else. I made 2 digital photo's of 2 lockups when it happened and this is the result of the tool, the TSC is different in both errors, the rest is the same: CPU 0 4 northbridge TSC b7d4a144d0 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 This is not a software problem! Presumably some access that the CPU is doing to the controller has timed out and caused the MCE. It might be useful if we could get a stack trace from where the MCE was triggered - does anyone know if it's possible to do this? It's possible that only NVidia really could tell why this error would result from a disk problem, though. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Hendrik . wrote: Ok, I did actually not copy the coreret code in the mcelog, leaving me some errors about the Northbridge. If I do it again it gives me something else. I made 2 digital photo's of 2 lockups when it happened and this is the result of the tool, the TSC is different in both errors, the rest is the same: CPU 0 4 northbridge TSC b7d4a144d0 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 This is not a software problem! Presumably some access that the CPU is doing to the controller has timed out and caused the MCE. It might be useful if we could get a stack trace from where the MCE was triggered - does anyone know if it's possible to do this? It's possible that only NVidia really could tell why this error would result from a disk problem, though. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Hendrik . wrote: So I think there is a problem with this specific CK804 ATA controller causing the MCE... Any clues? Yes, the SATA chip is broken. Probably time to check the known errata on the chip, and if it isn't known, bring nvidia in to debug their silicon. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
After even more tests I found out the following: - Running 'dd_rescue /dev/sda1 /dev/zero' on the on-board Silicon Image Inc. SiI 3114 controller handles the bad sector just fine and does not give a MCE. This is on the same motherboard that does give the MCE error on the Nvidia port. The following SATA controllers are in that machine: * IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) * RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) - Running the dd_rescue command om another PC with a different type of motherboard (M2NPV-VM) also with a Nvidia Nforce 4 (altough different) chipset work fine and reports the bad sector like the SiL 3114 controller on the other PC. This PC has the following lspci listing for SATA controller: * IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1) So I think there is a problem with this specific CK804 ATA controller causing the MCE... Any clues? Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list=396545433 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
After even more tests I found out the following: - Running 'dd_rescue /dev/sda1 /dev/zero' on the on-board Silicon Image Inc. SiI 3114 controller handles the bad sector just fine and does not give a MCE. This is on the same motherboard that does give the MCE error on the Nvidia port. The following SATA controllers are in that machine: * IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) * RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) - Running the dd_rescue command om another PC with a different type of motherboard (M2NPV-VM) also with a Nvidia Nforce 4 (altough different) chipset work fine and reports the bad sector like the SiL 3114 controller on the other PC. This PC has the following lspci listing for SATA controller: * IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1) So I think there is a problem with this specific CK804 ATA controller causing the MCE... Any clues? Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=listsid=396545433 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Ok, I did actually not copy the coreret code in the mcelog, leaving me some errors about the Northbridge. If I do it again it gives me something else. I made 2 digital photo's of 2 lockups when it happened and this is the result of the tool, the TSC is different in both errors, the rest is the same: CPU 0 4 northbridge TSC b7d4a144d0 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 This is not a software problem! CPU 0 4 northbridge TSC c4dd3a549f Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 This is not a software problem! It's a bit strange but if I copy the results from my first post I get the Northbridge error, perhaps because there is an 'enter' between the first line with the 'bank 4' and the 'b2070f0f' line. The mcelog tool handles this different from the error in 1 line. Regards, Hendrik Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail=summer+activities+for+kids=bz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
>> hangs. If I try it after a reboot with 'mcelog --k8 >> --ascii' or whatever parameter, there is no output at > You could type error back in from the email ? Ok I copied it into the tool, it gives me: CPU 0 4 northbridge TSC b7d4a144d0 Northbridge ECC error ECC syndrome = 0 STATUS 0 MCGSTATUS 4 This is a bit strange because I repeatedly tested the RAM yesterday and it gives no problems. And even more interesting: the error occurs at a reproducible moment: when reading the bad sector from the Seagate harddisk. And with an older kernel I was able to just copy all stuff from the drive using dd_rescue... I do not have ECC RAM in my PC by the way. > > Isn't it strange to say that the controller does > > something bad if there is just a bad sector on the > > drive that is reported and handled correctly in an > > older kernel > Not really. Its very strange it gives an MCE at all > but this is a known > failure path (and should be a fixed known failure > path) for the Nvidia SATA. So how to proceed in tackling this problem now? Is there anything I can do to (help you guys ;)) fix it? At this moment it unfortunately does not look to me as a fixed failure path... Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. http://mobile.yahoo.com/go?refer=1GNXIC - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
> How can I do this? I have installed mcelog but I > cannot run it after the MCE error because the whole PC > hangs. If I try it after a reboot with 'mcelog --k8 > --ascii' or whatever parameter, there is no output at You could type error back in from the email ? > Isn't it strange to say that the controller does > something bad if there is just a bad sector on the > drive that is reported and handled correctly in an > older kernel Not really. Its very strange it gives an MCE at all but this is a known failure path (and should be a fixed known failure path) for the Nvidia SATA. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Probably a similar problem is described in the linux-ide mailing list a while ago: http://www.opensubscriber.com/message/[EMAIL PROTECTED]/6490911.html >> Argh. I'm seeing a show stopper bug on sata_nv here. >> ata_exec_internal >> is MCE-ing on the READ_NATIVE_MAX_EXT command on >> both i386 and amd64, with >> top of Linus' tree + this patch. :( > >Oddly, the command at least executes and doesn't MCE >(but it's not at all >happy either) if I use ATA_PROT_PIO. I wonder if >ATA_PROT_NODATA is buggered on this sata_nv chip >(Asus A8N-E). At least it is a similar motherboard that is used (however I have explicitly have the A8N-E Deluxe edition). I try not to repair my SATA disk for now with the Seatools, so if there is some testing to be done, I can run it with the bad disk. Regards, Hendrik Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. http://mobile.yahoo.com/go?refer=1GNXIC - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
> > HARDWARE ERROR > > CPU 0: Machine Check Exception: 4 Bank 4: > > b2070f0f > > TSC b7d4a144d0 > > This is not a software problem! > > Run through mcelog --ascii to decode and contact > your > > hardware vendor > > Kernel panic - not syncing: Machine check > > You should run this through mcelog as it suggests > and see what it shows. > The kernel should be handling this properly, > unless the drive problem > is causing the controller to do something bad. Note > that kernels 2.6.20 > and later use ADMA mode on the nForce4 SATA > controller whereas previous > versions used it essentially like a PATA controller, > so it is not > surprising that the behavior is different. How can I do this? I have installed mcelog but I cannot run it after the MCE error because the whole PC hangs. If I try it after a reboot with 'mcelog --k8 --ascii' or whatever parameter, there is no output at all. If I try to redirect the output to the syslog, nothing is in there because the computer stopped working and did not save the log anymore. Isn't it strange to say that the controller does something bad if there is just a bad sector on the drive that is reported and handled correctly in an older kernel (I have confirmed a bad sector on the drive using the Seatools software from Seagate)? In my opinion a kernel should not stop responding at all with a bad sector on the disk. I cannot change the controller's behavior and did all the updates there are to make in function, but the problem is introduced using the newer kernel series. Perhaps nobody has tried accessing a bad SATA drive before, to simulate such an error? If it helps I could try a different type of motherboard to see what happens there? (Asus M2NPV-VM) Regards, Hendrik Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Hendrik . wrote: Last night I discovered a problem in my RAID5 array and finally after a lot of tests I narrowed it down to a bad sector on one of the hard disks and some goofy kernels. I just yesterday build a new PC using an existing array of 5 disks in RAID 5. I did build the array with only 4 out of 5 disks in the system but the rebuild processes stopped over and over again apparently at the same position. At last I found out that the harddisk at the first SATA port had developed some bad sectors which made the kernel stop completely when it tried to read that sector with the following error on the screen: HARDWARE ERROR CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC b7d4a144d0 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check You should run this through mcelog as it suggests and see what it shows. The kernel should be handling this properly, unless the drive problem is causing the controller to do something bad. Note that kernels 2.6.20 and later use ADMA mode on the nForce4 SATA controller whereas previous versions used it essentially like a PATA controller, so it is not surprising that the behavior is different. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Last night I discovered a problem in my RAID5 array and finally after a lot of tests I narrowed it down to a bad sector on one of the hard disks and some goofy kernels. I just yesterday build a new PC using an existing array of 5 disks in RAID 5. I did build the array with only 4 out of 5 disks in the system but the rebuild processes stopped over and over again apparently at the same position. At last I found out that the harddisk at the first SATA port had developed some bad sectors which made the kernel stop completely when it tried to read that sector with the following error on the screen: HARDWARE ERROR CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC b7d4a144d0 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check Googling around made me check memory, upgrade the BIOS and things like that but now i DO think that this IS a software problem, which is in the linux kernel. I was running the standard 2.6.20-16 kernel series from Ubuntu Feisty Fawn (using the generic and server built) and I built my own 2.6.22.1 but the problem still persisted. When copying manually with dd_rescue I was not able to copy past the bad sector or the MCE error reappeared. Only when using the standard Ubuntu Edgy Eft kernel (2.6.17-12-server) the problem went away completely and the syslog was filled with normal lines like: Jul 28 22:58:26 mediaserver kernel: [ 6562.446868] ata2: error=0x40 { UncorrectableError } Jul 28 22:58:26 mediaserver kernel: [ 6562.446875] sd 1:0:0:0: SCSI error: return code = 0x802 Jul 28 22:58:26 mediaserver kernel: [ 6562.446880] Additional sense: Unrecovered read error - auto reallocate failed Jul 28 22:58:26 mediaserver kernel: [ 6562.446887] end_request: I/O error, dev sda, sector 205534870 So in the end I was able to copy my stuff off the bad harddisk to a new disk (losing some bytes because of my already dirty RAID5 array) but I do think this is a kernel bug or at least strange behavior as an old kernel is willing to continue operation on something 'minor' as a bad sector. In the end when I will start scrubbing the drive array overnight a simple bad sector on the array will take down the complete system instead of just continuing with 1 faulty drive in the array! Some information about the hardware: AMD Athlon 64 3000+ Asus A8N-E Deluxe motherboard 1 GB RAM 4 Seagate 7200.9 drives on the NVIDIA SATA controller (sda ... sdd) 2 WD drives on the IDE controller (hda, hdc) Running Feisty Fawn 64 bit Server edition Faulty drive is /dev/sda and on thus on the first SATA port. Changing this to a different port on the motherboard gives the same lockup. There is also a SIL 3114 controller on the motherboard but I have not tried to dd_rescue with the faulty drive on that controller to see if it locks up the kernel. Regards, Hendrik van den Boogaard Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search http://search.yahoo.com/search?fr=oni_on_mail=graduation+gifts=bz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Last night I discovered a problem in my RAID5 array and finally after a lot of tests I narrowed it down to a bad sector on one of the hard disks and some goofy kernels. I just yesterday build a new PC using an existing array of 5 disks in RAID 5. I did build the array with only 4 out of 5 disks in the system but the rebuild processes stopped over and over again apparently at the same position. At last I found out that the harddisk at the first SATA port had developed some bad sectors which made the kernel stop completely when it tried to read that sector with the following error on the screen: HARDWARE ERROR CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC b7d4a144d0 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check Googling around made me check memory, upgrade the BIOS and things like that but now i DO think that this IS a software problem, which is in the linux kernel. I was running the standard 2.6.20-16 kernel series from Ubuntu Feisty Fawn (using the generic and server built) and I built my own 2.6.22.1 but the problem still persisted. When copying manually with dd_rescue I was not able to copy past the bad sector or the MCE error reappeared. Only when using the standard Ubuntu Edgy Eft kernel (2.6.17-12-server) the problem went away completely and the syslog was filled with normal lines like: Jul 28 22:58:26 mediaserver kernel: [ 6562.446868] ata2: error=0x40 { UncorrectableError } Jul 28 22:58:26 mediaserver kernel: [ 6562.446875] sd 1:0:0:0: SCSI error: return code = 0x802 Jul 28 22:58:26 mediaserver kernel: [ 6562.446880] Additional sense: Unrecovered read error - auto reallocate failed Jul 28 22:58:26 mediaserver kernel: [ 6562.446887] end_request: I/O error, dev sda, sector 205534870 So in the end I was able to copy my stuff off the bad harddisk to a new disk (losing some bytes because of my already dirty RAID5 array) but I do think this is a kernel bug or at least strange behavior as an old kernel is willing to continue operation on something 'minor' as a bad sector. In the end when I will start scrubbing the drive array overnight a simple bad sector on the array will take down the complete system instead of just continuing with 1 faulty drive in the array! Some information about the hardware: AMD Athlon 64 3000+ Asus A8N-E Deluxe motherboard 1 GB RAM 4 Seagate 7200.9 drives on the NVIDIA SATA controller (sda ... sdd) 2 WD drives on the IDE controller (hda, hdc) Running Feisty Fawn 64 bit Server edition Faulty drive is /dev/sda and on thus on the first SATA port. Changing this to a different port on the motherboard gives the same lockup. There is also a SIL 3114 controller on the motherboard but I have not tried to dd_rescue with the faulty drive on that controller to see if it locks up the kernel. Regards, Hendrik van den Boogaard Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search http://search.yahoo.com/search?fr=oni_on_mailp=graduation+giftscs=bz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Hendrik . wrote: Last night I discovered a problem in my RAID5 array and finally after a lot of tests I narrowed it down to a bad sector on one of the hard disks and some goofy kernels. I just yesterday build a new PC using an existing array of 5 disks in RAID 5. I did build the array with only 4 out of 5 disks in the system but the rebuild processes stopped over and over again apparently at the same position. At last I found out that the harddisk at the first SATA port had developed some bad sectors which made the kernel stop completely when it tried to read that sector with the following error on the screen: HARDWARE ERROR CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC b7d4a144d0 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check You should run this through mcelog as it suggests and see what it shows. The kernel should be handling this properly, unless the drive problem is causing the controller to do something bad. Note that kernels 2.6.20 and later use ADMA mode on the nForce4 SATA controller whereas previous versions used it essentially like a PATA controller, so it is not surprising that the behavior is different. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
HARDWARE ERROR CPU 0: Machine Check Exception: 4 Bank 4: b2070f0f TSC b7d4a144d0 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check You should run this through mcelog as it suggests and see what it shows. The kernel should be handling this properly, unless the drive problem is causing the controller to do something bad. Note that kernels 2.6.20 and later use ADMA mode on the nForce4 SATA controller whereas previous versions used it essentially like a PATA controller, so it is not surprising that the behavior is different. How can I do this? I have installed mcelog but I cannot run it after the MCE error because the whole PC hangs. If I try it after a reboot with 'mcelog --k8 --ascii' or whatever parameter, there is no output at all. If I try to redirect the output to the syslog, nothing is in there because the computer stopped working and did not save the log anymore. Isn't it strange to say that the controller does something bad if there is just a bad sector on the drive that is reported and handled correctly in an older kernel (I have confirmed a bad sector on the drive using the Seatools software from Seagate)? In my opinion a kernel should not stop responding at all with a bad sector on the disk. I cannot change the controller's behavior and did all the updates there are to make in function, but the problem is introduced using the newer kernel series. Perhaps nobody has tried accessing a bad SATA drive before, to simulate such an error? If it helps I could try a different type of motherboard to see what happens there? (Asus M2NPV-VM) Regards, Hendrik Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Probably a similar problem is described in the linux-ide mailing list a while ago: http://www.opensubscriber.com/message/[EMAIL PROTECTED]/6490911.html Argh. I'm seeing a show stopper bug on sata_nv here. ata_exec_internal is MCE-ing on the READ_NATIVE_MAX_EXT command on both i386 and amd64, with top of Linus' tree + this patch. :( Oddly, the command at least executes and doesn't MCE (but it's not at all happy either) if I use ATA_PROT_PIO. I wonder if ATA_PROT_NODATA is buggered on this sata_nv chip (Asus A8N-E). At least it is a similar motherboard that is used (however I have explicitly have the A8N-E Deluxe edition). I try not to repair my SATA disk for now with the Seatools, so if there is some testing to be done, I can run it with the bad disk. Regards, Hendrik Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos more. http://mobile.yahoo.com/go?refer=1GNXIC - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
How can I do this? I have installed mcelog but I cannot run it after the MCE error because the whole PC hangs. If I try it after a reboot with 'mcelog --k8 --ascii' or whatever parameter, there is no output at You could type error back in from the email ? Isn't it strange to say that the controller does something bad if there is just a bad sector on the drive that is reported and handled correctly in an older kernel Not really. Its very strange it gives an MCE at all but this is a known failure path (and should be a fixed known failure path) for the Nvidia SATA. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
hangs. If I try it after a reboot with 'mcelog --k8 --ascii' or whatever parameter, there is no output at You could type error back in from the email ? Ok I copied it into the tool, it gives me: CPU 0 4 northbridge TSC b7d4a144d0 Northbridge ECC error ECC syndrome = 0 STATUS 0 MCGSTATUS 4 This is a bit strange because I repeatedly tested the RAM yesterday and it gives no problems. And even more interesting: the error occurs at a reproducible moment: when reading the bad sector from the Seagate harddisk. And with an older kernel I was able to just copy all stuff from the drive using dd_rescue... I do not have ECC RAM in my PC by the way. Isn't it strange to say that the controller does something bad if there is just a bad sector on the drive that is reported and handled correctly in an older kernel Not really. Its very strange it gives an MCE at all but this is a known failure path (and should be a fixed known failure path) for the Nvidia SATA. So how to proceed in tackling this problem now? Is there anything I can do to (help you guys ;)) fix it? At this moment it unfortunately does not look to me as a fixed failure path... Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos more. http://mobile.yahoo.com/go?refer=1GNXIC - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading a bad sector does not report failure as 'read error' but hangs PC with 'Machine Check Exception'
Ok, I did actually not copy the coreret code in the mcelog, leaving me some errors about the Northbridge. If I do it again it gives me something else. I made 2 digital photo's of 2 lockups when it happened and this is the result of the tool, the TSC is different in both errors, the rest is the same: CPU 0 4 northbridge TSC b7d4a144d0 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 This is not a software problem! CPU 0 4 northbridge TSC c4dd3a549f Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b2070f0f MCGSTATUS 4 This is not a software problem! It's a bit strange but if I copy the results from my first post I get the Northbridge error, perhaps because there is an 'enter' between the first line with the 'bank 4' and the 'b2070f0f' line. The mcelog tool handles this different from the error in 1 line. Regards, Hendrik Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mailp=summer+activities+for+kidscs=bz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/