Re: Problem with ata layer in 2.6.24
Kasper Sandberg wrote: to put some timeline perspective into this. i believe it was in 2005 i assembled the system, and when i realized it was faulty, on old ide driver, i stopped using it - that miht have been in beginning of 2006. then for almost a year i werent using it, hoping to somehow fix it, but in january 2007 i think it was, atleast in the very beginning of 2007, i hit upon the idea of trying libata, and ever since the system has been running 24/7 - doing these errors around 2 times a day. i have multiple times reported my problems to lkml, but nothing has happened, i also tried to aproeach jgarzik direcly, but he was not interested. i really hope this can be solved now, its a huge problem my fileserver has an asus k8v motherboard, with via chipset (k8t880 i think it is, or something like it). currently using the promise controller again(strangely enough all the timeouts seems to happen here, and when the ITE was on, there, not the onboard one), in conjunction with the onboard via. Timeouts are nasty to debug. It can be caused by whole range of different problems including transmission errors, bad power, faulty drive, mishandled media error, IRQ misrouting, dumb hardware bug. It's basically 'uh... I told the controller to do something but it never called me back'. If you see timeouts on multiple devices connected to different controllers, the chance is that you have problem somewhere else. The most likely culprit is bad power. Please... * Post the result of 'lspci -nn' and kernel log including full boot log and error messages. * Try to isolate the problem. ie. Does removing several number of drives fix the problem? If the problem is localized to certain device, what happens if you move it? Does the problem follow the drive or stay with the port? If the failing drives are SATA, it's a good idea to power some of the failing drives with a separate PSU and see whether anything is different. By trying to isolate the hardware problem, more can be learned about the error condition and even when the problem actually isn't hardware problem, it gives us much deeper insight of the problem and clues regarding where to look. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. We don't see very many libata problems at the distro level and they for the most part boil down to - error messages looking different - Most bugs I get are things like media errors (timeout looks different, UNC report looks different) - broken hardware - I've closed a whole raft of bugs that turn out to be new PC systems where even the BIOS doesn't see the drives - faulty hardware being picked up because we actually do real error checking now. We now check for and give some devices more slack while still doing error checking. Both IDE layers also added blacklists for stuff like the TSScorp DVD drives. Qemu has now had its bugs patched. - sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver anyway - pata_ali MWDMA with ATAPI, PIO works fine, all a bit of a mystery and as it affects only a few chip variants hard to figure out. Workaround libata.dma=1 - CS handling. On a few boxes using cable select (particularly on one drive and not the other) shows up a problem, normally a failed SRST. That's still under investigation. - Promise timeouts. The old IDE times out then polls the device and finds the IRQ was never sent and then recovers so the user sees a short stall but no errors. The new libata doesn't do this and pdc202xx_old thus produces some error messages on some boxes. Backup polling is on my todo list. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tuesday 29 January 2008, Alan Cox wrote: not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. We don't see very many libata problems at the distro level and they for the most part boil down to - error messages looking different - Most bugs I get are things like media errors (timeout looks different, UNC report looks different) - broken hardware - I've closed a whole raft of bugs that turn out to be new PC systems where even the BIOS doesn't see the drives - faulty hardware being picked up because we actually do real error checking now. We now check for and give some devices more slack while still doing error checking. Both IDE layers also added blacklists for stuff like the TSScorp DVD drives. Qemu has now had its bugs patched. - sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver anyway - pata_ali MWDMA with ATAPI, PIO works fine, all a bit of a mystery and as it affects only a few chip variants hard to figure out. Workaround libata.dma=1 - CS handling. On a few boxes using cable select (particularly on one drive and not the other) shows up a problem, normally a failed SRST. That's still under investigation. - Promise timeouts. The old IDE times out then polls the device and finds the IRQ was never sent and then recovers so the user sees a short stall but no errors. The new libata doesn't do this and pdc202xx_old thus produces some error messages on some boxes. Backup polling is on my todo list. I have not had a problem, no errors at all, since I rebooted to 2.6.24-rc8 with the added argument in the kernel line in grub (from dmesg): [0.00] Kernel command line: ro root=/dev/VolGroup00/LogVol00 acpi_use_timer_override rhgb quiet which causes dmesg to log, some time later: [ 27.581823] ENABLING IO-APIC IRQs [ 27.582014] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 27.592017] ..MP-BIOS bug: 8254 timer not connected to IO-APIC [ 27.592068] ...trying to set up timer (IRQ0) through the 8259A ... failed. [ 27.592071] ...trying to set up timer as Virtual Wire IRQ... works. [ 27.703623] Brought up 1 CPUs This was about noonish yesterday, and the logs have been silent regarding this 'exception Emask' error since then. The drive itself has also passed a smartctl -t long test with no errors since then. Now, the last boot that had the problem was to 2.6.24, which did NOT have that 'acpi_use_timer_override' argument, and its dmesg logged: [ 24.934176] ENABLING IO-APIC IRQs [ 24.934367] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1 [ 25.045973] Brought up 1 CPUs Now, my question is, did the use of that argument, while it looked like it failed, cause the setup code to do something correct that the default path didn't do? Is this the clue we're all looking for? Since libata is apparently the path taken by TPTB, I'm going to build and boot to a 2.6.24 using libata, but add that argument to grubs kernel line in only one of 2 copies of that stanza. Wish me luck. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) The intelligence of any discussion diminishes with the square of the number of participants. -- Adam Walinsky - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
As slight change here, I was going to use the same .config as 2.6.24-rc8, but just discovered that neither rc8 nor final is finding the drivers for my If it is not finding a driver that is nothing to do with libata. It means it's not being loaded by the distribution, or the distribution kernel is too old (2.6.22) for the hardware - in which case see the Fedora respins which are on 2.6.23.something right now. Alan - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB. Depends how the memory is mapped. Any memory physically above the 4GB boundary Alan - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Alan Cox wrote: not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. We don't see very many libata problems at the distro level and they for the most part boil down to - sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver anyway Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB. Richard - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Gene Heskett writes: On Tuesday 29 January 2008, Alan Cox wrote: As slight change here, I was going to use the same .config as 2.6.24-rc8, but just discovered that neither rc8 nor final is finding the drivers for my If it is not finding a driver that is nothing to do with libata. It means it's not being loaded by the distribution, or the distribution kernel is too old (2.6.22) for the hardware - in which case see the Fedora respins which are on 2.6.23.something right now. Alan Home built kernel Alan. But you are as good as anyone to tell me what I need to turn on in order for this dvdwriter to be enabled: [ 28.862478] ata2.00: ATAPI: LITE-ON DVDRW SHM-165H6S, HS06, max UDMA/66 [ 28.908647] ata2.00: limited to UDMA/33 due to 40-wire cable [ 29.081253] ata2.00: configured for UDMA/33 it has had several 80 wire cables tried, hasn't fixed this, and does not seem to effect its operation when it does work. [ 29.132405] scsi 1:0:0:0: CD-ROMLITE-ON DVDRW SHM-165H6S HS06 PQ: 0 ANSI: 5 [ 43.450795] scsi 1:0:0:0: Attached scsi generic sg1 type 5 --- No further mention of it in dmesg, and k3b cannot find the drive at any /dev/sgX address. .config attached, what else do I need to turn on? ... # CONFIG_BLK_DEV_SR is not set For starters, enable CONFIG_BLK_DEV_SR. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tuesday 29 January 2008, Mark Lord wrote: Gene Heskett wrote: .. Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number when dmesg says its found ok at ata2.00? I've turned on an option that says something about using the bios for device access this build, but I'll be surprised if that's it. :) .. It should show up as /dev/scd0 or something very similar. Tisn't. Darnit. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) clock speed - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Gene Heskett wrote: .. Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number when dmesg says its found ok at ata2.00? I've turned on an option that says something about using the bios for device access this build, but I'll be surprised if that's it. :) .. It should show up as /dev/scd0 or something very similar. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tuesday 29 January 2008, Alan Cox wrote: not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. We don't see very many libata problems at the distro level and they for the most part boil down to - error messages looking different - Most bugs I get are things like media errors (timeout looks different, UNC report looks different) - broken hardware - I've closed a whole raft of bugs that turn out to be new PC systems where even the BIOS doesn't see the drives - faulty hardware being picked up because we actually do real error checking now. We now check for and give some devices more slack while still doing error checking. Both IDE layers also added blacklists for stuff like the TSScorp DVD drives. Qemu has now had its bugs patched. - sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver anyway - pata_ali MWDMA with ATAPI, PIO works fine, all a bit of a mystery and as it affects only a few chip variants hard to figure out. Workaround libata.dma=1 - CS handling. On a few boxes using cable select (particularly on one drive and not the other) shows up a problem, normally a failed SRST. That's still under investigation. - Promise timeouts. The old IDE times out then polls the device and finds the IRQ was never sent and then recovers so the user sees a short stall but no errors. The new libata doesn't do this and pdc202xx_old thus produces some error messages on some boxes. Backup polling is on my todo list. As slight change here, I was going to use the same .config as 2.6.24-rc8, but just discovered that neither rc8 nor final is finding the drivers for my dvd writer while using libata, so its not useable. So I've enable a couple of things in the 2.6.24 build that aren't in the 2.6.24-rc8. When I find the magic twanger, I'll rebuild -rc8 with it too. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) River: He didn't lie down. They never lie down. --Serenity - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tuesday 29 January 2008, Jeff Garzik wrote: Gene Heskett wrote: Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number when dmesg says its found ok at ata2.00? I've turned on an option that says something about using the bios for device access this build, but I'll be surprised if that's it. :) I think you mean /dev/scdx not /dev/sdx. Make sure you have the 'sr' driver compiled and load (CONFIG_BLK_DEV_SR). That menu item COULD be moved, I don't have any REAL scsi stuff, so I didn't look there. My bad, with help from hiding it like that. :-) The bios-for-dev-access thing definitely won't help, and may hurt (by taking over the device you wanted to test). Ok, if BLK_DEV_SR fails, I'll take that back out. I'm heating the room making kernels here. :) Thanks Jeff. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Life sucks, but death doesn't put out at all. -- Thomas J. Kopp - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tuesday 29 January 2008, Mikael Pettersson wrote: Gene Heskett writes: On Tuesday 29 January 2008, Alan Cox wrote: As slight change here, I was going to use the same .config as 2.6.24-rc8, but just discovered that neither rc8 nor final is finding the drivers for my If it is not finding a driver that is nothing to do with libata. It means it's not being loaded by the distribution, or the distribution kernel is too old (2.6.22) for the hardware - in which case see the Fedora respins which are on 2.6.23.something right now. Alan Home built kernel Alan. But you are as good as anyone to tell me what I need to turn on in order for this dvdwriter to be enabled: [ 28.862478] ata2.00: ATAPI: LITE-ON DVDRW SHM-165H6S, HS06, max UDMA/66 [ 28.908647] ata2.00: limited to UDMA/33 due to 40-wire cable [ 29.081253] ata2.00: configured for UDMA/33 it has had several 80 wire cables tried, hasn't fixed this, and does not seem to effect its operation when it does work. [ 29.132405] scsi 1:0:0:0: CD-ROMLITE-ON DVDRW SHM-165H6S HS06 PQ: 0 ANSI: 5 [ 43.450795] scsi 1:0:0:0: Attached scsi generic sg1 type 5 --- No further mention of it in dmesg, and k3b cannot find the drive at any /dev/sgX address. .config attached, what else do I need to turn on? ... # CONFIG_BLK_DEV_SR is not set For starters, enable CONFIG_BLK_DEV_SR. That could stand to be moved or renamed, it is well buried in the menu for the REAL scsi stuffs, which I don't have any of. Enabled building now. Thanks. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) An air of FRENCH FRIES permeates my nostrils!! - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Mark Lord wrote: Gene Heskett wrote: .. Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number when dmesg says its found ok at ata2.00? I've turned on an option that says something about using the bios for device access this build, but I'll be surprised if that's it. :) .. It should show up as /dev/scd0 or something very similar. Does it appear as /dev/sr0? Try ll /dev/s* and see what you get. Anyway, these /dev/ entries are produced by udev, not by libata. rh - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tuesday 29 January 2008, Florian Attenberger wrote: On Mon, 28 Jan 2008 14:13:21 -0500 Gene Heskett [EMAIL PROTECTED] wrote: I had to reboot early this morning due to a freezeup, and I had a bunch of these in the messages log: == Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 19:42:11 coyote kernel: [42461.915973] ata1.00: cmd ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma 4096 out Jan 27 19:42:11 coyote kernel: [42461.915974] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 19:42:11 coyote kernel: [42461.915978] ata1.00: status: { DRDY } Jan 27 19:42:11 coyote kernel: [42461.916005] ata1: soft resetting link Jan 27 19:42:12 coyote kernel: [42462.078216] ata1.00: configured for UDMA/100 Jan 27 19:42:12 coyote kernel: [42462.078232] ata1: EH complete Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 27 19:42:12 coyote kernel: [42462.114230] sd 0:0:0:0: [sda] Write Protect is off Jan 27 19:42:12 coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA === I had this error too, or maybe only a similar one, and another, neither of which of i still have the error output laying around, so I'm posting both fixes, that i found here on lkml: 1) disabling ncq like that: echo 1 /sys/block/sda/device/queue_depth Interesting.. 2) this patch: libata_drain_fifo_on_stuck_drq_hsm.patch ( applies to 2.6.24 too ) Signed-off-by: Mark Lord [EMAIL PROTECTED] --- --- old/drivers/ata/libata-sff.c 2007-09-28 09:29:22.0 -0400 +++ linux/drivers/ata/libata-sff.c 2007-09-28 09:39:44.0 -0400 @@ -420,6 +420,28 @@ ap-ops-irq_on(ap); } +static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc) +{ + u8 stat = ata_chk_status(ap); + /* + * Try to clear stuck DRQ if necessary, + * by reading/discarding up to two sectors worth of data. + */ + if ((stat ATA_DRQ) (!qc || qc-dma_dir != DMA_TO_DEVICE)) { + unsigned int i; + unsigned int limit = qc ? qc-sect_size : ATA_SECT_SIZE; + + printk(KERN_WARNING Draining up to %u words from data FIFO.\n, + limit); + for (i = 0; i limit ; ++i) { + ioread16(ap-ioaddr.data_addr); + if (!(ata_chk_status(ap) ATA_DRQ)) + break; + } + printk(KERN_WARNING Drained %u/%u words.\n, i, limit); + } +} + /** *ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller *@ap: port to handle error for @@ -476,7 +498,7 @@ } ata_altstatus(ap); - ata_chk_status(ap); + ata_drain_fifo(ap, qc); ap-ops-irq_clear(ap); spin_unlock_irqrestore(ap-lock, flags); - This too. Thanks Florian. I'll keep these in mind as there may be more than one cat in need of skinning here. See a couple of posts I made to lkml this morning for the investigation I'm doing re the kernel argument 'acpi_use_timer_override', experimental builds under way right now. Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number when dmesg says its found ok at ata2.00? I've turned on an option that says something about using the bios for device access this build, but I'll be surprised if that's it. :) -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Ah, sweet Springtime, when a young man lightly turns his fancy over! - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tuesday 29 January 2008, Daniel Barkalow wrote: On Tue, 29 Jan 2008, Gene Heskett wrote: For starters, enable CONFIG_BLK_DEV_SR. That could stand to be moved or renamed, it is well buried in the menu for the REAL scsi stuffs, which I don't have any of. Enabled building now. The SCSI support type (disk, tape, CD-ROM) section of that menu actually applies to all ATA-command-set devices that don't use the old IDE code. For example, usb-storage uses SCSI disk out of that section, and I've only seen Probe all LUNs on each SCSI device be needed for a particular USB card reader with two slots. At this point, most of the things in the kernel that refer to SCSI probably should say storage (or ATA, really, but that would make the acronyms confusing). Incidentally, you should be able to save debugging time for problems like missing sr by building it as a module, which will build really quickly and not require a reboot to test. -Daniel *This .sig left intentionally blank* I did, Daniel, but while that has worked, its not been 100% foolproof in the past, so I just waste the 9 minutes building a new kernel as cheap insurance. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Mal: If it's Alliance trouble you got, you might want to consider another ship. Some onboard here fought for the Independents. --Episode #8, Out of Gas - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
things in the kernel that refer to SCSI probably should say storage (or ATA, really, but that would make the acronyms confusing). SCSI is a command protocol. It is what your CD-ROM drive and USB storage devices talk (albeit with a bit of an accent). Alan - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Don't know. Is there an easy way to find out? E820 map on boot shows you I think. By the way, and on a totally different subject. I wonder if this: MODULE_DESCRIPTION(low-level driver for AMD PATA IDE); mightn't be changed to something like: MODULE_DESCRIPTION(low-level driver for AMD and nVidia PATA IDE); Fair point. I'll add that so people can find the early Nvidia stuff. Alan - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Gene Heskett wrote: For starters, enable CONFIG_BLK_DEV_SR. That could stand to be moved or renamed, it is well buried in the menu for the REAL scsi stuffs, which I don't have any of. Enabled building now. The SCSI support type (disk, tape, CD-ROM) section of that menu actually applies to all ATA-command-set devices that don't use the old IDE code. For example, usb-storage uses SCSI disk out of that section, and I've only seen Probe all LUNs on each SCSI device be needed for a particular USB card reader with two slots. At this point, most of the things in the kernel that refer to SCSI probably should say storage (or ATA, really, but that would make the acronyms confusing). Incidentally, you should be able to save debugging time for problems like missing sr by building it as a module, which will build really quickly and not require a reboot to test. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Alan Cox wrote: Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB. Depends how the memory is mapped. Any memory physically above the 4GB boundary Don't know. Is there an easy way to find out? By the way, and on a totally different subject. I wonder if this: MODULE_DESCRIPTION(low-level driver for AMD PATA IDE); mightn't be changed to something like: MODULE_DESCRIPTION(low-level driver for AMD and nVidia PATA IDE); It took a fair bit if digging in /sys/ to figure out why I was loading pata_amd. Richard - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Gene Heskett wrote: Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number when dmesg says its found ok at ata2.00? I've turned on an option that says something about using the bios for device access this build, but I'll be surprised if that's it. :) I think you mean /dev/scdx not /dev/sdx. Make sure you have the 'sr' driver compiled and load (CONFIG_BLK_DEV_SR). The bios-for-dev-access thing definitely won't help, and may hurt (by taking over the device you wanted to test). Jeff - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Alan Cox wrote: things in the kernel that refer to SCSI probably should say storage (or ATA, really, but that would make the acronyms confusing). SCSI is a command protocol. It is what your CD-ROM drive and USB storage devices talk (albeit with a bit of an accent). Among other things, yes. But SCSI standards also specify electrical interfaces that aren't at all related to the electrical interfaces used by a lot of devices, and a lot of the places the kernel uses the term suggest that it's also talking about the electrical interface (or, at least, connector shape). For example, it's misleading to talk about SCSI CDROM support meaning the command protocol when hardly anybody has ever seen a CDROM drive that doesn't use the SCSI command protocol, but most people know about both SCSI-connector and PATA-connector CDROM drives. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
rgheck wrote: Alan Cox wrote: not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. We don't see very many libata problems at the distro level and they for the most part boil down to - sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver anyway Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB. .. For all practical purposes, most memory over 3GB (or sometimes even 2GB) on a 32-bit x86 system is treated as 4GB by the motherboard. Because it's not the amount of *memory* that matters so much, but rather the amount of *used address space*. Video cards, PCI devices, other motherboard resources etc.. can all subtract from the available address space, leaving much less than 4GB for your RAM. -ml - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Gene Heskett wrote: On Tuesday 29 January 2008, Mark Lord wrote: Gene Heskett wrote: .. Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number when dmesg says its found ok at ata2.00? I've turned on an option that says something about using the bios for device access this build, but I'll be surprised if that's it. :) .. It should show up as /dev/scd0 or something very similar. Tisn't. Darnit. .. It requires CONFIG_SCSI, CONFIG_BLK_DEV_SD, CONFIG_BLK_DEV_SR, in the kernel .config. The _SR one (SCSI Reader) is for CD/DVD support. Cheers - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Alan Cox wrote: not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. We don't see very many libata problems at the distro level and they for the most part boil down to - error messages looking different - Most bugs I get are things like media errors (timeout looks different, UNC report looks different) The SCSI error reporting really ought to include a simple interpretation of the error for end users (The drive doesn't support this command A sector's data got lost The drive timed out The drive failed The drive is entirely gone). There's too much similarity between the message you get when you try a SMART test that doesn't apply to the drive and what you get when the drive is broken. - faulty hardware being picked up because we actually do real error checking now. We now check for and give some devices more slack while still doing error checking. Both IDE layers also added blacklists for stuff like the TSScorp DVD drives. Qemu has now had its bugs patched. I think this is the big source of unhappy users (and, of course, they all look the same and the reports stay findable by Google, so it looks a lot worse than it is). People getting this problem in distro kernels probably really do want to have a way to report it with enough detail from logs to get it dealt with and then switch back to old IDE until the fix propagates through. And it's possible that the error recovery is suboptimal in some cases. It seems to like resetting drives too much; perhaps if it keeps seeing the same problem and resetting the drive, it should decide that the drive's error reporting is just bad and just ignore that error like the old IDE did (but, in this case, after saying what it's doing). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
The SCSI error reporting really ought to include a simple interpretation of the error for end users (The drive doesn't support this command A sector's data got lost The drive timed out The drive failed The drive is entirely gone). There's too much similarity between the message you get when you try a SMART test that doesn't apply to the drive and what you get when the drive is broken. That would be the SCSI verbose messages option. I think the Eric Youngdale consortium added it about Linux 1.2. Nowdays its always built that way. And it's possible that the error recovery is suboptimal in some cases. It seems to like resetting drives too much; perhaps if it keeps seeing the same problem and resetting the drive, it should decide that the drive's error reporting is just bad and just ignore that error like the old IDE did (but, in this case, after saying what it's doing). Nothing like casually praying the users data hasn't gone for a walk is there. If we don't act on them the users don't report them until something really bad occurs so that isn't an option. Alan - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tuesday 29 January 2008, Jeff Garzik wrote: Gene Heskett wrote: On Tuesday 29 January 2008, Jeff Garzik wrote: Gene Heskett wrote: Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number when dmesg says its found ok at ata2.00? I've turned on an option that says something about using the bios for device access this build, but I'll be surprised if that's it. :) I think you mean /dev/scdx not /dev/sdx. Make sure you have the 'sr' driver compiled and load (CONFIG_BLK_DEV_SR). That menu item COULD be moved, I don't have any REAL scsi stuff, so I didn't look there. My bad, with help from hiding it like that. :-) The bios-for-dev-access thing definitely won't help, and may hurt (by taking over the device you wanted to test). Ok, if BLK_DEV_SR fails, I'll take that back out. I'm heating the room making kernels here. :) I can say with 100% certainty that 'sr' is required in order to use your dvd writer with libata. :) Jeff And as usual, you are 100% correct, thanks. And now back to our regularly scheduled testing for 'exception Emask' errors. :) -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Main's Law: For every action there is an equal and opposite government program. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
That could stand to be moved or renamed, it is well buried in the menu for the REAL scsi stuffs, which I don't have any of. Yes you do - USB storage and ATAPI are SCSI - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
rgheck wrote: Mark Lord wrote: rgheck wrote: Alan Cox wrote: not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. We don't see very many libata problems at the distro level and they for the most part boil down to - sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver anyway Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB. .. For all practical purposes, most memory over 3GB (or sometimes even 2GB) on a 32-bit x86 system is treated as 4GB by the motherboard. Because it's not the amount of *memory* that matters so much, but rather the amount of *used address space*. Video cards, PCI devices, other motherboard resources etc.. can all subtract from the available address space, leaving much less than 4GB for your RAM. Right. So it looks like I do have this issue, though I haven't seen any actual problems on 24. Is there a known workaround? .. For now, the workaround is to not enable the RAM above 4GB. Your kernel .config file should therefore have these two lines: CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set Later, once the issue is fixed at the driver level (soon), you can get your high memory back again by enabling CONFIG_HIGHMEM64G, though this will cost a few percent of performance in the extra page table overhead it creates. Cheers - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Mark Lord wrote: rgheck wrote: Alan Cox wrote: not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. We don't see very many libata problems at the distro level and they for the most part boil down to - sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver anyway Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB. .. For all practical purposes, most memory over 3GB (or sometimes even 2GB) on a 32-bit x86 system is treated as 4GB by the motherboard. Because it's not the amount of *memory* that matters so much, but rather the amount of *used address space*. Video cards, PCI devices, other motherboard resources etc.. can all subtract from the available address space, leaving much less than 4GB for your RAM. Right. So it looks like I do have this issue, though I haven't seen any actual problems on 24. Is there a known workaround? rh - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tue, 29 Jan 2008, Alan Cox wrote: The SCSI error reporting really ought to include a simple interpretation of the error for end users (The drive doesn't support this command A sector's data got lost The drive timed out The drive failed The drive is entirely gone). There's too much similarity between the message you get when you try a SMART test that doesn't apply to the drive and what you get when the drive is broken. That would be the SCSI verbose messages option. I think the Eric Youngdale consortium added it about Linux 1.2. Nowdays its always built that way. I've seen a lot of verbosity out of SCSI messages, but I haven't seen a straightforward interpretation of the problem in there. It's all information useful for debugging, not information useful for system administration. And it's possible that the error recovery is suboptimal in some cases. It seems to like resetting drives too much; perhaps if it keeps seeing the same problem and resetting the drive, it should decide that the drive's error reporting is just bad and just ignore that error like the old IDE did (but, in this case, after saying what it's doing). Nothing like casually praying the users data hasn't gone for a walk is there. If we don't act on them the users don't report them until something really bad occurs so that isn't an option. On the other hand, bringing the system down because a device is misbehaving is a poor idea. I've personally recovered most of the data off of a dying drive because the system was willing to let me keep using the drive anyway; IIRC, the drive didn't work at all after a reboot, so I would have lost all the data instead of only a little had the system insisted on a perfectly functioning drive in order to use it at all. There ought to be some middle ground between doing nothing until the computer really breaks and breaking the computer before then, but that's an issue not specific to libata. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
I've seen a lot of verbosity out of SCSI messages, but I haven't seen a straightforward interpretation of the problem in there. It's all information useful for debugging, not information useful for system administration. It tells you what is going on. Unfortunately that frequently requires some basic knowledge of how to interpret the error report. Drive interface behaviour simply doesn't boil down to a fault light on the dashboard or a tighten the cable. For most common fault types you'll get errors most administrators should find meaningful - like Media error On the other hand, bringing the system down because a device is misbehaving is a poor idea. I've personally recovered most of the data off Hence we have RAID and SATA hotplug. Alan - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Tuesday 29 January 2008, Alan Cox wrote: That could stand to be moved or renamed, it is well buried in the menu for the REAL scsi stuffs, which I don't have any of. Yes you do - USB storage and ATAPI are SCSI By the linux software definition maybe. But I've defined scsi as that which uses a 50 wire cable using 50 contact centronics connectors since the mid '70's, and which often needs a ready supply of nubile virgins to sacrifice to make it work, particularly with the old resistor pack terminations psu's whose 5 volt line is only 4.85 volts due to old age. That's what I call REAL scsi. Its also a REAL PITA if the terms aren't active. You can call what you are doing 'scsi' because you are using much the same command structure, and that is good, but its not the real thing with all its hardware warts and/or capabilities. For one thing, this version usually works. :) Furinstance, you can tell 2 scsi devices on the same controller to talk to each other, moving files from one to the other, and the host controller can then goto sleep the cpu isn't involved until the devices send it a wakeup to advise the controller that the transfer has been done, and the controller may or may not then interrupt and advise the cpu. You can do that with separate controllers too as long as they have a compatible DMA channel available to both. I doubt libata has that capability now, or ever will, cuz these ide/atapi devices are generally dumber than rocks about that. But any device claiming to be scsi-II is supposed to be able to do those sorts of things while the cpu is off crunching numbers for BOINC or whatever. But that puts my mild objections to classifying this as 'scsi' in a more understandable context. :-) -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) When some people decide it's time for everyone to make big changes, it means that they want you to change first. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
By the linux software definition maybe. But I've defined scsi as that which uses a 50 wire cable using 50 contact centronics connectors since the mid '70's, and which often needs a ready supply of nubile virgins t 25, 50 or 68, with multiple voltage levels, plus of course it might be over fibre or copper FC loop and .. SCSI is a protocol. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Gene, If you still want to try it, I did manage to get the old IDE subsystem working. The issue with pata_amd concerns modprobe.conf. You probably have an alias to it there, as Fedora seems to insert these. (I don't know if they're actually needed or not.) If you comment out that line, then mkinitrd will run successfully, and you can try it that way. By the way, is there an easy way to use different modprobe.conf files with different kernels? Do make sure that you're building whatever drivers you need for your particular IDE chipset. (This is under IDE chipset support.) I suppose it's safe to build them all as modules. You may also want to compile ide-scsi (SCSI emulation support), as some older CD drives seem to need this, in the form of an hdx=ide-scsi command line option. Richard Gene Heskett wrote: On Tuesday 29 January 2008, Alan Cox wrote: That could stand to be moved or renamed, it is well buried in the menu for the REAL scsi stuffs, which I don't have any of. Yes you do - USB storage and ATAPI are SCSI By the linux software definition maybe. But I've defined scsi as that which uses a 50 wire cable using 50 contact centronics connectors since the mid '70's, and which often needs a ready supply of nubile virgins to sacrifice to make it work, particularly with the old resistor pack terminations psu's whose 5 volt line is only 4.85 volts due to old age. That's what I call REAL scsi. Its also a REAL PITA if the terms aren't active. You can call what you are doing 'scsi' because you are using much the same command structure, and that is good, but its not the real thing with all its hardware warts and/or capabilities. For one thing, this version usually works. :) Furinstance, you can tell 2 scsi devices on the same controller to talk to each other, moving files from one to the other, and the host controller can then goto sleep the cpu isn't involved until the devices send it a wakeup to advise the controller that the transfer has been done, and the controller may or may not then interrupt and advise the cpu. You can do that with separate controllers too as long as they have a compatible DMA channel available to both. I doubt libata has that capability now, or ever will, cuz these ide/atapi devices are generally dumber than rocks about that. But any device claiming to be scsi-II is supposed to be able to do those sorts of things while the cpu is off crunching numbers for BOINC or whatever. But that puts my mild objections to classifying this as 'scsi' in a more understandable context. :-) - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Gene Heskett wrote: I doubt libata has that capability now, or ever will, cuz these ide/atapi devices are generally dumber than rocks about that. But any device claiming to be scsi-II is supposed to be able to do those sorts of things while the cpu is off crunching numbers for BOINC or whatever. .. The CD/DVD drives all all MMC devices internally, which means they speak a SCSI command protocol. Regardless of the electrical or optical interface. Linux is software, and the software protocol is exactly the same for them, no matter what the cable/bus type happens to be. Cheers - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Gene Heskett wrote: [ 0.00] If you got timer trouble try acpi_use_timer_override This is from the dmesg of my previous post. Can anyone tell me what it actually means? -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) I have a simple rule in life: If I don't understand something, it must be bad. - Linus Torvalds - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Peter Zijlstra wrote: On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote: 1. Wrong mailing list; use linux-ide (@vger) instead. What, and keep all us other interested people in the dark? As a test, I tried rebooting to the latest fedora kernel and found it kills X, so I'm back to the second to last fedora version ATM, and the third 'smartctl -t lng /dev/sda' in 24 hours is running now. The first two completed with no errors. I've added the linux-ide list to refresh those people of the problem, the logs are being spammed by this message stanza: Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel: [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA And it just did it again, using the fedora kernel but without logging anything at all when it froze. In other words I had to reboot between the word list and the word to above. So now I'm booted to 2.6.24-rc7. Before it crashes again, here is the dmesg: [0.00] Linux version 2.6.24-rc7 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Mon Jan 14 10:00:40 EST 2008 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009f800 (usable) [0.00] BIOS-e820: 0009f800 - 000a (reserved) [0.00] BIOS-e820: 000f - 0010 (reserved) [0.00] BIOS-e820: 0010 - 3fff (usable) [0.00] BIOS-e820: 3fff - 3fff3000 (ACPI NVS) [0.00] BIOS-e820: 3fff3000 - 4000 (ACPI data) [0.00] BIOS-e820: fec0 - fec01000 (reserved) [0.00] BIOS-e820: fee0 - fee01000 (reserved) [0.00] BIOS-e820: - 0001 (reserved) [0.00] 127MB HIGHMEM available. [0.00] 896MB LOWMEM available. [0.00] Entering add_active_range(0, 0, 262128) 0 entries of 256 used [0.00] Zone PFN ranges: [0.00] DMA 0 - 4096 [0.00] Normal 4096 - 229376 [0.00] HighMem229376 - 262128 [0.00] Movable zone start PFN for each node [0.00] early_node_map[1] active PFN ranges [0.00] 0:0 - 262128 [0.00] On node 0 totalpages: 262128 [0.00] DMA zone: 32 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 4064 pages, LIFO batch:0 [0.00] Normal zone: 1760 pages used for memmap [0.00] Normal zone: 223520 pages, LIFO batch:31 [0.00] HighMem zone: 255 pages used for memmap [0.00] HighMem zone: 32497 pages, LIFO batch:7 [0.00] Movable zone: 0 pages used for memmap [0.00] DMI 2.2 present. [0.00] ACPI: RSDP 000F7220, 0014 (r0 Nvidia) [0.00] ACPI: RSDT 3FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD 0) [0.00] ACPI: FACP 3FFF3040, 0074 (r1 Nvidia AWRDACPI 42302E31 AWRD 0) [0.00] ACPI: DSDT 3FFF30C0, 4CC4 (r1 NVIDIA AWRDACPI 1000 MSFT 10E) [0.00] ACPI: FACS 3FFF, 0040 [0.00] ACPI: APIC 3FFF7DC0, 006E (r1 Nvidia AWRDACPI 42302E31 AWRD 0) [0.00] Nvidia board detected. Ignoring ACPI timer override. [0.00] If you got timer trouble try acpi_use_timer_override [0.00] ACPI: PM-Timer IO Port: 0x4008 [0.00] ACPI: Local APIC address 0xfee0 [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [0.00] Processor #0 6:10 APIC version 16 [0.00] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) [0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.00] ACPI: BIOS IRQ0 pin2 override ignored. [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 15
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Gene Heskett wrote: On Monday 28 January 2008, Zan Lynx wrote: On Mon, 2008-01-28 at 11:50 -0500, Calvin Walton wrote: On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote: On Monday 28 January 2008, Mikael Pettersson wrote: Unfortunately we also see: [ 48.285456] nvidia: module license 'NVIDIA' taints kernel. [ 48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI 19 (level, high) - IRQ 20 [ 48.550149] NVRM: loading NVIDIA UNIX x86 Kernel Module 169.07 Thu Dec 13 18:42:56 PST 2007 We have no way of debugging that module, so please try 2.6.24 without it. Sorry, I can't do this and have a working machine. The nv driver has suffered bit rot or something since the FC2 days when it COULD run a 19 crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg compressed to 10%. The system is not usable on a day to basis without the nvidia driver. You should probably give the nouveau[1] driver a try, if only for testing purposes; if you are running an NV4x (G6x or G7x) card in particular, it works a lot better than the nv driver for 2d support. 1. http://nouveau.freedesktop.org/wiki/InstallNouveau But nouveau is much less stable than nv. For testing purposes, go with stable. I believe at this point, its moot. I captured quite a few instances of that error message while rebooting the last time, all of which occurred long before I logged in and did a startx (I boot to runlevel 3 here), so the kernel was NOT tainted at that point. That dmesg has been posted and some questions asked. As this has gone on for a while, it seems to me that with 14,800 google hits on this problem, Linus should call a halt until this is found and fixed. But I'm not Linus. I'm also locking up for 30 at a time, probably ready for reboot #7 today. I'm not sure why it won't run his screen though. I can use nv to run a 1920x1200 laptop LCD. It *is* dog slow (although nouveau was not any better with a NV17 / 440-Go -- render support for AA fonts seems to be missing), but it does work. I've been trying to run a long selftest on that drive, but the constant reboots are fscking that up. I have attached the last smartctl -a output, indicating that the test was aborted probably from all the resets that are being issued, the last one froze me for around 5 minutes but I haven't rebooted yet. Its attached. Can anyone see if there is actually anything wrong with the drive? If a boot will last long enough for the -t long to complete, then it passes with no errors, but this was interrupted now for the 3rd time. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Well begun is half done. -- Aristotle smartctl version 5.37 [i386-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Western Digital Caviar SE family Device Model: WDC WD2000JB-00EVA0 Serial Number:WD-WMAEH2782398 Firmware Version: 15.05R15 User Capacity:200,049,647,616 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Mon Jan 28 12:39:08 2008 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x85) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining. Total time to complete Offline data collection: (6942) seconds. Offline data collection capabilities: (0x79) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 88) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
Re: Problem with ata layer in 2.6.24
Added Alan to CC: list. [ 30.703188] scsi0 : pata_amd [ 30.709313] scsi1 : pata_amd [ 30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 14 [ 30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 15 [ 30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0, 15.05R15, max UDMA/100 [ 30.864756] ata1.00: 390721968 sectors, multi 16: LBA48 [ 30.871629] ata1.00: configured for UDMA/100 .. Gene, please confirm with us that your primary/master hard drive (above) is connected with an 80-wire UDMA cable, as opposed to the older 40-wire cables. [ 31.195305] ata2.00: ATAPI: LITE-ON DVDRW SHM-165H6S, HS06, max UDMA/66 [ 31.243813] ata2.01: ATA-7: MAXTOR STM3320620A, 3.AAE, max UDMA/100 [ 31.243816] ata2.01: 625142448 sectors, multi 16: LBA48 [ 31.243825] ata2.00: limited to UDMA/33 due to 40-wire cable [ 31.417074] ata2.00: configured for UDMA/33 [ 31.451769] ata2.01: configured for UDMA/100 .. That looks like an unrelated bug to me: the driver says 40-wire cable but then goes and chooses UDMA/100 on one of the drives. Alan? - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Gene Heskett wrote: Greeting; I had to reboot early this morning due to a freezeup, and I had a bunch of these in the messages log: == Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 19:42:11 coyote kernel: [42461.915973] ata1.00: cmd ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma 4096 out Jan 27 19:42:11 coyote kernel: [42461.915974] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 19:42:11 coyote kernel: [42461.915978] ata1.00: status: { DRDY } Jan 27 19:42:11 coyote kernel: [42461.916005] ata1: soft resetting link Jan 27 19:42:12 coyote kernel: [42462.078216] ata1.00: configured for UDMA/100 Jan 27 19:42:12 coyote kernel: [42462.078232] ata1: EH complete Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 27 19:42:12 coyote kernel: [42462.114230] sd 0:0:0:0: [sda] Write Protect is off Jan 27 19:42:12 coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA === That one showed up about 2 hours ago, so I expect I'll be locked up again before I've managed a 24 hour uptime. This drive passed a 'smartctl -t long /dev/sda' with flying colors after the reboot this morning. Two instances were logged after I had rebooted to 2.6.24 from 2.6.24-rc8: Jan 24 20:46:33 coyote kernel: [0.00] Linux version 2.6.24 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Thu Jan 24 20:17:55 EST 2008 Jan 27 02:28:29 coyote kernel: [193207.445158] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 02:28:29 coyote kernel: [193207.445170] ata1.00: cmd 35/00:08:f9:24:0a/00:00:17:00:00/e0 tag 0 dma 4096 out Jan 27 02:28:29 coyote kernel: [193207.445172] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 02:28:29 coyote kernel: [193207.445175] ata1.00: status: { DRDY } Jan 27 02:28:29 coyote kernel: [193207.445202] ata1: soft resetting link Jan 27 02:28:29 coyote kernel: [193207.607384] ata1.00: configured for UDMA/100 Jan 27 02:28:29 coyote kernel: [193207.607399] ata1: EH complete Jan 27 02:28:29 coyote kernel: [193207.609681] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 27 02:28:29 coyote kernel: [193207.619277] sd 0:0:0:0: [sda] Write Protect is off Jan 27 02:28:29 coyote kernel: [193207.649041] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jan 27 02:30:06 coyote kernel: [193304.336929] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 02:30:06 coyote kernel: [193304.336940] ata1.00: cmd ca/00:20:69:22:a6/00:00:00:00:00/e7 tag 0 dma 16384 out Jan 27 02:30:06 coyote kernel: [193304.336942] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 02:30:06 coyote kernel: [193304.336945] ata1.00: status: { DRDY } Jan 27 02:30:06 coyote kernel: [193304.336972] ata1: soft resetting link Jan 27 02:30:06 coyote kernel: [193304.499210] ata1.00: configured for UDMA/100 Jan 27 02:30:06 coyote kernel: [193304.499226] ata1: EH complete Jan 27 02:30:06 coyote kernel: [193304.499714] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 27 02:30:06 coyote kernel: [193304.499857] sd 0:0:0:0: [sda] Write Protect is off Jan 27 02:30:06 coyote kernel: [193304.502315] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA None were logged during the time I was running an -rc7 or -rc8. The previous hits on this resulted in the udma speed being downgraded till it was actually running in pio just before the freeze that required the hardware reset button. I'll reboot to -rc8 right now and resume. If its the drive, I should see it. If not, then 2.6.24 is where I'll point the finger. .. The only libata change I can see that could possibly affect your setup, is this one here, which went in sometime between -rc7 and -final: --- linux-2.6.24-rc7/drivers/ata/libata-eh.c2008-01-06 16:45:38.0 -0500 +++ linux-2.6.24/drivers/ata/libata-eh.c2008-01-24 17:58:37.0 -0500 @@ -1733,11 +1733,15 @@ ehc-i.action = ~ATA_EH_PERDEV_MASK; } - /* consider speeding down */ + /* propagate timeout to host link */ + if ((all_err_mask AC_ERR_TIMEOUT) !ata_is_host_link(link)) + ap-link.eh_context.i.err_mask |= AC_ERR_TIMEOUT; + It looks pretty innocent to me, though. If you want to try reverting just that change (comment out the two lines and rebuild), then that might provide useful information here. If -final is still b0rked even with those two lines changed back, then I suspect you're just getting lucky when switching between the -rc7/-rc8 kernel and the -final kernel. Lucky in a bad way, that is. The real test would be to rebuild the kernel without libata, and *with* the old IDE
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Richard Heck wrote: I've recently seen this kind of error myself, under Fedora 8, using the Fedora 2.6.23 kernels: I'd see a train of the same sort of error: Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) usually associated with the optical drive, and then it seems as if the whole SATA subsystem would lock up, and the machine then becomes useless: I get journal commit errors if I'm lucky; if I'm not, it just locks up. My system is also using the pata_amd driver. I have not seen these sorts of errors with the 2.6.24 kernels. Richard Heck Unforch, this is my only bootable drive, and its raising hell with things, about 6 hardware reset initiated reboots so far today since 6:15 am. If it persists I'll go see if Circuit City still has any pata drives left as this mobo won't boot from a sata card. Gene Heskett wrote: On Monday 28 January 2008, Peter Zijlstra wrote: On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote: 1. Wrong mailing list; use linux-ide (@vger) instead. What, and keep all us other interested people in the dark? As a test, I tried rebooting to the latest fedora kernel and found it kills X, so I'm back to the second to last fedora version ATM, and the third 'smartctl -t lng /dev/sda' in 24 hours is running now. The first two completed with no errors. I've added the linux-ide list to refresh those people of the problem, the logs are being spammed by this message stanza: Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel: [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA And it just did it again, using the fedora kernel but without logging anything at all when it froze. In other words I had to reboot between the word list and the word to above. So now I'm booted to 2.6.24-rc7. Before it crashes again, here is the dmesg: [0.00] Linux version 2.6.24-rc7 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Mon Jan 14 10:00:40 EST 2008 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009f800 (usable) [0.00] BIOS-e820: 0009f800 - 000a (reserved) [0.00] BIOS-e820: 000f - 0010 (reserved) [0.00] BIOS-e820: 0010 - 3fff (usable) [0.00] BIOS-e820: 3fff - 3fff3000 (ACPI NVS) [0.00] BIOS-e820: 3fff3000 - 4000 (ACPI data) [0.00] BIOS-e820: fec0 - fec01000 (reserved) [0.00] BIOS-e820: fee0 - fee01000 (reserved) [0.00] BIOS-e820: - 0001 (reserved) [0.00] 127MB HIGHMEM available. [0.00] 896MB LOWMEM available. [0.00] Entering add_active_range(0, 0, 262128) 0 entries of 256 used [0.00] Zone PFN ranges: [0.00] DMA 0 - 4096 [0.00] Normal 4096 - 229376 [0.00] HighMem229376 - 262128 [0.00] Movable zone start PFN for each node [0.00] early_node_map[1] active PFN ranges [0.00] 0:0 - 262128 [0.00] On node 0 totalpages: 262128 [0.00] DMA zone: 32 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 4064 pages, LIFO batch:0 [0.00] Normal zone: 1760 pages used for memmap [0.00] Normal zone: 223520 pages, LIFO batch:31 [0.00] HighMem zone: 255 pages used for memmap [0.00] HighMem zone: 32497 pages, LIFO batch:7 [0.00] Movable zone: 0 pages used for memmap [0.00] DMI 2.2 present. [0.00] ACPI: RSDP 000F7220, 0014 (r0 Nvidia) [0.00] ACPI: RSDT 3FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Mikael Pettersson wrote: Gene Heskett writes: On Monday 28 January 2008, Peter Zijlstra wrote: On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote: 1. Wrong mailing list; use linux-ide (@vger) instead. What, and keep all us other interested people in the dark? As a test, I tried rebooting to the latest fedora kernel and found it kills X, so I'm back to the second to last fedora version ATM, and the third 'smartctl -t lng /dev/sda' in 24 hours is running now. The first two completed with no errors. I've added the linux-ide list to refresh those people of the problem, the logs are being spammed by this message stanza: Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel: [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA It's not obvious from this incomplete dmesg log what HW or driver is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one, it should be pata_amd driving a WDC disk: [ 30.702887] pata_amd :00:09.0: version 0.3.10 [ 30.703052] PCI: Setting latency timer of device :00:09.0 to 64 [ 30.703188] scsi0 : pata_amd [ 30.709313] scsi1 : pata_amd [ 30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 14 [ 30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 15 [ 30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0, 15.05R15, max UDMA/100 [ 30.864756] ata1.00: 390721968 sectors, multi 16: LBA48 [ 30.871629] ata1.00: configured for UDMA/100 Unfortunately we also see: [ 48.285456] nvidia: module license 'NVIDIA' taints kernel. [ 48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI 19 (level, high) - IRQ 20 [ 48.550149] NVRM: loading NVIDIA UNIX x86 Kernel Module 169.07 Thu Dec 13 18:42:56 PST 2007 We have no way of debugging that module, so please try 2.6.24 without it. Sorry, I can't do this and have a working machine. The nv driver has suffered bit rot or something since the FC2 days when it COULD run a 19 crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg compressed to 10%. The system is not usable on a day to basis without the nvidia driver. Fix the nv driver so it will run this screen at its native resolution and I'll be glad to run it even if it won't run google earth, which I do use from time to time. Now, if in all the hits you can get from google on this, currently 14,800 just for 'exception Emask', apparently caused by a timeout, if 100% of the complainers are running nvidia drivers also, then I see a legit complaint. Again, fix the nv driver so it will run my screen I'll be glad to switch. I can see the reason, sure, but the machine must be capable of doing its common day to day stuff, while using that driver, like running kde for kmail, and browsers that work. If the problems persist, please try to capture a complete log from the failing kernel -- the interesting bits are everything from initial boot up to and including the first few errors. You may need to increase the kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT). If by log you mean /var/log/messages, I have several megabytes of those. If you mean a live dmesg capture taken right now, its attached. It contains several of these at the bottom. I long ago made the kernel log buffer bigger, cuz it couldn't even show the start immediately after the boot, and even the dump to syslog was truncated. There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final. That is what I was afraid of. I've done some limited grepping in that branch of the kernel tree, and cannot seem to locate where this EH handler is being invoked from. There is 2 lines of interest in the dmesg: [0.00] Nvidia board detected. Ignoring ACPI timer override. [0.00] If you got timer trouble try acpi_use_timer_override But I have NDI what it means, kernel argument/xconfig option? I've also done some googling, and it appears this problem is fairly
Re: Problem with ata layer in 2.6.24
On Mon, 2008-01-28 at 11:50 -0500, Calvin Walton wrote: On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote: On Monday 28 January 2008, Mikael Pettersson wrote: Unfortunately we also see: [ 48.285456] nvidia: module license 'NVIDIA' taints kernel. [ 48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI 19 (level, high) - IRQ 20 [ 48.550149] NVRM: loading NVIDIA UNIX x86 Kernel Module 169.07 Thu Dec 13 18:42:56 PST 2007 We have no way of debugging that module, so please try 2.6.24 without it. Sorry, I can't do this and have a working machine. The nv driver has suffered bit rot or something since the FC2 days when it COULD run a 19 crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg compressed to 10%. The system is not usable on a day to basis without the nvidia driver. You should probably give the nouveau[1] driver a try, if only for testing purposes; if you are running an NV4x (G6x or G7x) card in particular, it works a lot better than the nv driver for 2d support. 1. http://nouveau.freedesktop.org/wiki/InstallNouveau But nouveau is much less stable than nv. For testing purposes, go with stable. I'm not sure why it won't run his screen though. I can use nv to run a 1920x1200 laptop LCD. It *is* dog slow (although nouveau was not any better with a NV17 / 440-Go -- render support for AA fonts seems to be missing), but it does work. -- Zan Lynx [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: Problem with ata layer in 2.6.24
Gene Heskett wrote: On Monday 28 January 2008, Mark Lord wrote: .. Another way is to use the make_bad_sector utility that is included in the source tarball for hdparm-7.7, as follows: make_bad_sector --readback /dev/sda 474507 Apparently not in the rpm, darnit. .. That's okay. It should still be in the SRPM source file. And it's a tiny download from sourceforge.net: http://sourceforge.net/search/?type_of_search=softtype_of_search=softwords=hdparm Cheers - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Daniel Barkalow wrote: Can you switch back to old IDE to get your work done (and to make sure it's not a hardware issue that's developed recently)? I think it'd be really, REALLY helpful to a lot of people if you, or someone, could explain in moderate detail how this might be done. I tried doing it myself, but I'm not sufficiently expert at configuring kernels that I was ever able to figure out how to do it. Obviously, the short version is: switch back to Fedora 6. But this kind of problem with libata---and yes, you're almost surely right that it's not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. Richard - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Mark Lord wrote: Gene Heskett wrote: .. And so far no one has tried to comment on those 2 dmesg lines I've quoted a couple of times now, here's another: [0.00] Nvidia board detected. Ignoring ACPI timer override. [0.00] If you got timer trouble try acpi_use_timer_override what the heck is that trying to tell me to do, in some sort of broken english? .. I think it says this: If your system is misbehaving, then try adding the acpi_use_timer_override keyword to your kernel command line (/boot/grub/menu.lst) and see if it helps. So, you can either hardcode it in /boot/grub/menu.lst (just add it to the end of the first line you see there that begins with the word kernel. Or you can just try it temporarily at boot time (safer, but tricker), by catching GRUB (the bootloader) before it actually loads Linux. Usually there's some key or something it says you have 3 seconds to hit for a menu, so do that, and then use the cursor keys to find the first kernel line in that menu and hit e (edit) to go and add the acpi_use_timer_override keyword to the end of that line (same as above). .. Minor correction (having just tried it here): once you see the GRUB (boot) menu, hit the letter e to edit the first entry, then scroll to the kernel line, and hit the letter e again to edit that line. It should put you at the end of the line, where you can just type a space and then acpi_use_timer_override and then hit enter to finish the (temporary) edit. Then hit b for boot. -ml - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Alan Cox wrote: On Mon, Jan 28, 2008 at 01:38:40PM -0500, Mark Lord wrote: [ 31.195305] ata2.00: ATAPI: LITE-ON DVDRW SHM-165H6S, HS06, max UDMA/66 [ 31.243813] ata2.01: ATA-7: MAXTOR STM3320620A, 3.AAE, max UDMA/100 [ 31.243816] ata2.01: 625142448 sectors, multi 16: LBA48 [ 31.243825] ata2.00: limited to UDMA/33 due to 40-wire cable [ 31.417074] ata2.00: configured for UDMA/33 [ 31.451769] ata2.01: configured for UDMA/100 .. That looks like an unrelated bug to me: the driver says 40-wire cable but then goes and chooses UDMA/100 on one of the drives. We currently assume that - If we have host side detecting 40 that we use 40 - If we have drive side detecting 40 use 40 - If we have drive side detecting 80 and host thinks 80 use 80 The case where the drives disagree isn't currently considered. .. Ahh. Tricky mess, that stuff. I believe that if we have a drive that only sees 40W, then it is probably best to restrict the other drive as well. Just in case the drive that reports 40W cannot actually keep up with the 80W timings, even when they're for the other drive. That's my 2p. Cheers - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Richard Heck wrote: Daniel Barkalow wrote: Can you switch back to old IDE to get your work done (and to make sure it's not a hardware issue that's developed recently)? I think it'd be really, REALLY helpful to a lot of people if you, or someone, could explain in moderate detail how this might be done. I tried doing it myself, but I'm not sufficiently expert at configuring kernels that I was ever able to figure out how to do it. As far as configuring the kernel, I can help: Go to Device Drivers, ATA/ATAPI/MFM/RLL support, and turn on anything that looks relevant; go to Device Drivers, Serial ATA and Parallel ATA drivers, and turn off anything that's PATA and looks relevant. (Whether a device uses IDE or PATA depends on which driver that supports the device is present and find it first, not on any sort of global configuration, which is probably what tripped you up) Building this and installing it along with the appropriate initrd (which might be handled by Fedora's install scripts) will either get you back to old IDE or will make your kernel panic on boot, depending on whether you got it right (so make sure you can still boot the kernel you're sure of or something from a boot disk). This will also cause your hard drives to show up as different device nodes, so if your boot process doesn't mount by disk uuid but by some other feature (and I don't know what Fedora does), you'll also need to change it to something either stable across access methods or which works for the one you're now using. Obviously, the short version is: switch back to Fedora 6. But this kind of problem with libata---and yes, you're almost surely right that it's not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. Fedora really ought to provide documentation, because there's some distro-specific stuff (like how you deal with the kernel's device node for the root partition changing), and they're using code by default that's at least somewhat documented as experimental (although it doesn't seem to be actually marked as experimental in all cases). -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Gene Heskett wrote: While reading this msg as it came back, I locked up again and rebooted to 2.6.24, and got lucky (maybe) as the attached dmesg will show quite a few instances of this LNNNGG before the nvidia driver is loaded to taint the kernel. Have fun guys! On Monday 28 January 2008, Mikael Pettersson wrote: Gene Heskett writes: On Monday 28 January 2008, Peter Zijlstra wrote: On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote: 1. Wrong mailing list; use linux-ide (@vger) instead. What, and keep all us other interested people in the dark? As a test, I tried rebooting to the latest fedora kernel and found it kills X, so I'm back to the second to last fedora version ATM, and the third 'smartctl -t lng /dev/sda' in 24 hours is running now. The first two completed with no errors. I've added the linux-ide list to refresh those people of the problem, the logs are being spammed by this message stanza: Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel: [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA It's not obvious from this incomplete dmesg log what HW or driver is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one, it should be pata_amd driving a WDC disk: [ 30.702887] pata_amd :00:09.0: version 0.3.10 [ 30.703052] PCI: Setting latency timer of device :00:09.0 to 64 [ 30.703188] scsi0 : pata_amd [ 30.709313] scsi1 : pata_amd [ 30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 14 [ 30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 15 [ 30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0, 15.05R15, max UDMA/100 [ 30.864756] ata1.00: 390721968 sectors, multi 16: LBA48 [ 30.871629] ata1.00: configured for UDMA/100 Unfortunately we also see: [ 48.285456] nvidia: module license 'NVIDIA' taints kernel. [ 48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI 19 (level, high) - IRQ 20 [ 48.550149] NVRM: loading NVIDIA UNIX x86 Kernel Module 169.07 Thu Dec 13 18:42:56 PST 2007 We have no way of debugging that module, so please try 2.6.24 without it. Sorry, I can't do this and have a working machine. The nv driver has suffered bit rot or something since the FC2 days when it COULD run a 19 crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg compressed to 10%. The system is not usable on a day to basis without the nvidia driver. Fix the nv driver so it will run this screen at its native resolution and I'll be glad to run it even if it won't run google earth, which I do use from time to time. Now, if in all the hits you can get from google on this, currently 14,800 just for 'exception Emask', apparently caused by a timeout, if 100% of the complainers are running nvidia drivers also, then I see a legit complaint. Again, fix the nv driver so it will run my screen I'll be glad to switch. I can see the reason, sure, but the machine must be capable of doing its common day to day stuff, while using that driver, like running kde for kmail, and browsers that work. If the problems persist, please try to capture a complete log from the failing kernel -- the interesting bits are everything from initial boot up to and including the first few errors. You may need to increase the kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT). If by log you mean /var/log/messages, I have several megabytes of those. If you mean a live dmesg capture taken right now, its attached. It contains several of these at the bottom. I long ago made the kernel log buffer bigger, cuz it couldn't even show the start immediately after the boot, and even the dump to syslog was truncated. There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final. That is what I was afraid of. I've done some limited grepping in that branch of the kernel tree, and cannot seem to locate where this EH handler is being invoked from. There is 2
Re: Problem with ata layer in 2.6.24
Gene Heskett writes: On Monday 28 January 2008, Peter Zijlstra wrote: On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote: 1. Wrong mailing list; use linux-ide (@vger) instead. What, and keep all us other interested people in the dark? As a test, I tried rebooting to the latest fedora kernel and found it kills X, so I'm back to the second to last fedora version ATM, and the third 'smartctl -t lng /dev/sda' in 24 hours is running now. The first two completed with no errors. I've added the linux-ide list to refresh those people of the problem, the logs are being spammed by this message stanza: Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel: [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA It's not obvious from this incomplete dmesg log what HW or driver is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one, it should be pata_amd driving a WDC disk: [ 30.702887] pata_amd :00:09.0: version 0.3.10 [ 30.703052] PCI: Setting latency timer of device :00:09.0 to 64 [ 30.703188] scsi0 : pata_amd [ 30.709313] scsi1 : pata_amd [ 30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 14 [ 30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 15 [ 30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0, 15.05R15, max UDMA/100 [ 30.864756] ata1.00: 390721968 sectors, multi 16: LBA48 [ 30.871629] ata1.00: configured for UDMA/100 Unfortunately we also see: [ 48.285456] nvidia: module license 'NVIDIA' taints kernel. [ 48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI 19 (level, high) - IRQ 20 [ 48.550149] NVRM: loading NVIDIA UNIX x86 Kernel Module 169.07 Thu Dec 13 18:42:56 PST 2007 We have no way of debugging that module, so please try 2.6.24 without it. If the problems persist, please try to capture a complete log from the failing kernel -- the interesting bits are everything from initial boot up to and including the first few errors. You may need to increase the kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT). There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Gene Heskett wrote: Greeting; I had to reboot early this morning due to a freezeup, and I had a bunch of these in the messages log: == Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 19:42:11 coyote kernel: [42461.915973] ata1.00: cmd ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma 4096 out Jan 27 19:42:11 coyote kernel: [42461.915974] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 19:42:11 coyote kernel: [42461.915978] ata1.00: status: { DRDY } Jan 27 19:42:11 coyote kernel: [42461.916005] ata1: soft resetting link Jan 27 19:42:12 coyote kernel: [42462.078216] ata1.00: configured for UDMA/100 Jan 27 19:42:12 coyote kernel: [42462.078232] ata1: EH complete Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 27 19:42:12 coyote kernel: [42462.114230] sd 0:0:0:0: [sda] Write Protect is off Jan 27 19:42:12 coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA === That one showed up about 2 hours ago, so I expect I'll be locked up again before I've managed a 24 hour uptime. This drive passed a 'smartctl -t long /dev/sda' with flying colors after the reboot this morning. Two instances were logged after I had rebooted to 2.6.24 from 2.6.24-rc8: Jan 24 20:46:33 coyote kernel: [0.00] Linux version 2.6.24 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Thu Jan 24 20:17:55 EST 2008 Jan 27 02:28:29 coyote kernel: [193207.445158] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 02:28:29 coyote kernel: [193207.445170] ata1.00: cmd 35/00:08:f9:24:0a/00:00:17:00:00/e0 tag 0 dma 4096 out Jan 27 02:28:29 coyote kernel: [193207.445172] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 02:28:29 coyote kernel: [193207.445175] ata1.00: status: { DRDY } Jan 27 02:28:29 coyote kernel: [193207.445202] ata1: soft resetting link Jan 27 02:28:29 coyote kernel: [193207.607384] ata1.00: configured for UDMA/100 Jan 27 02:28:29 coyote kernel: [193207.607399] ata1: EH complete Jan 27 02:28:29 coyote kernel: [193207.609681] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 27 02:28:29 coyote kernel: [193207.619277] sd 0:0:0:0: [sda] Write Protect is off Jan 27 02:28:29 coyote kernel: [193207.649041] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jan 27 02:30:06 coyote kernel: [193304.336929] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 02:30:06 coyote kernel: [193304.336940] ata1.00: cmd ca/00:20:69:22:a6/00:00:00:00:00/e7 tag 0 dma 16384 out Jan 27 02:30:06 coyote kernel: [193304.336942] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 02:30:06 coyote kernel: [193304.336945] ata1.00: status: { DRDY } Jan 27 02:30:06 coyote kernel: [193304.336972] ata1: soft resetting link Jan 27 02:30:06 coyote kernel: [193304.499210] ata1.00: configured for UDMA/100 Jan 27 02:30:06 coyote kernel: [193304.499226] ata1: EH complete Jan 27 02:30:06 coyote kernel: [193304.499714] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 27 02:30:06 coyote kernel: [193304.499857] sd 0:0:0:0: [sda] Write Protect is off Jan 27 02:30:06 coyote kernel: [193304.502315] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA None were logged during the time I was running an -rc7 or -rc8. The previous hits on this resulted in the udma speed being downgraded till it was actually running in pio just before the freeze that required the hardware reset button. Unfortunately there are 1001 different causes for timeouts, so we need to drill down into the hardware, libata version, and ACPI version (most notably). I'll reboot to -rc8 right now and resume. If its the drive, I should see it. If not, then 2.6.24 is where I'll point the finger. There was also an ACPI update, which always affects interrupt handling (whose symptom can sometimes be a timeout). Definitely interesting in test results from what you describe. Jeff - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Mark Lord wrote: [ 64.037975] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 64.038102] ata1.00: BMDMA stat 0x65 [ 64.038227] ata1.00: cmd c8/00:58:89:3d:07/00:00:00:00:00/e0 tag 0 dma 45056 in [ 64.038229] res 51/40:58:8b:3d:07/00:00:00:00:00/e0 Emask 0x9 (media error) [ 64.038432] ata1.00: status: { DRDY ERR } [ 64.038555] ata1.00: error: { UNC } [ 64.050125] ata1.00: configured for UDMA/100 [ 64.050134] sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 [ 64.050138] sd 0:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor] [ 64.050142] Descriptor sense data with sense descriptors (in hex): [ 64.050143] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 64.050149] 00 07 3d 8b [ 64.050152] sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x4 [ 64.050155] end_request: I/O error, dev sda, sector 474507 .. This error looks somewhat different from the samples posted earlier. This one is quite definitively a bad sector. It should also show up in smartctl -a -data /dev/sda (near the bottom) if SMART was enabled on this drive at boot. It does not unforch. You could try reading that specific sector again just to make sure. One way is to figure out how to use dd for this. [EMAIL PROTECTED] ~]# dd if=/dev/sda bs=512 skip=474506 count=3 ��▒6 {�G���G���libkdecorations.so.1.0.0��c�®���J{�G���G���libkfontinst.so.0.0.0��c�®ʂ�GP�~GJ3G 6�7�8�#��z;{�G���G���libkhotkeys_shared.so.1.0.0��c�®���N{�G���G���libkickermain.so.1.0.0��c�®���Y{�G���G���libkonq.so.4.2.0��c�®���Z{�G���G���libkonqsidebarplugin.so.1.2.0��c�®���d{�G���G���libksgrd.so.1.2.0��c�®▒��G7 G▒�=G▒]��^���▒?e{�G���G���libksplashthemes.so.0.0.0��c�®{�G���G���libtaskbar.so.1.2.0��c�®{�G���G���libtaskmanager.so.1.0.0��c�®�3+0 records in 3+0 records out 1536 bytes (1.5 kB) copied, 6.1403e-05 s, 25.0 MB/s Another way is to use the make_bad_sector utility that is included in the source tarball for hdparm-7.7, as follows: make_bad_sector --readback /dev/sda 474507 Apparently not in the rpm, darnit. (when invoked as above, it does *not* make a bad sector; no worries). If it reports an I/O error consistently on that, then the sector is indeed faulty, and it's contents have long been lost. You can repair the bad sector (but not the original contents) like this: make_bad_sector --rewrite /dev/sda 474507 Cheers I'm going up to Clarksburg this afternoon to see if I can find a couple of drives, one a 2.5 bigger than 40Gb for my 2.5 maxtor usb housing, and another pata drive big enough to run this thing just re-install the December respin after I save as much of this as I can, there's nearly 50GB here now. Maybe it won't be so fscking picky about the next drive. I was hoping someone could look at that last dmseg I attached, but apparently everybody is blinded by unrelated details as that bad sector may have been transient, caused by the multiple hardware reset type reboots so far today :( The last 3 reboots have interrupted a 'smartctl -t long /dev/sda' in progress. :( If I reconvert to non libata, can I do that only for the pata drives of which there are 3 here including the dvd writer, and still use libata for the lone sata drive left? And can I do that without mucking with the device map, which will make amanda/tar attempt to do a level 0 on the whole system if its changed. I see the drives are at 254 again, when are they going to be given a stable device address out of the LANANA experimental group so we can reboot without mucking with that and driving tar crazy? Thanks everybody. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) I just had my entire INTESTINAL TRACT coated with TEFLON! - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
[ 64.037975] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 64.038102] ata1.00: BMDMA stat 0x65 [ 64.038227] ata1.00: cmd c8/00:58:89:3d:07/00:00:00:00:00/e0 tag 0 dma 45056 in [ 64.038229] res 51/40:58:8b:3d:07/00:00:00:00:00/e0 Emask 0x9 (media error) [ 64.038432] ata1.00: status: { DRDY ERR } [ 64.038555] ata1.00: error: { UNC } [ 64.050125] ata1.00: configured for UDMA/100 [ 64.050134] sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 [ 64.050138] sd 0:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor] [ 64.050142] Descriptor sense data with sense descriptors (in hex): [ 64.050143] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 64.050149] 00 07 3d 8b [ 64.050152] sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x4 [ 64.050155] end_request: I/O error, dev sda, sector 474507 .. This error looks somewhat different from the samples posted earlier. This one is quite definitively a bad sector. It should also show up in smartctl -a -data /dev/sda (near the bottom) if SMART was enabled on this drive at boot. You could try reading that specific sector again just to make sure. One way is to figure out how to use dd for this. Another way is to use the make_bad_sector utility that is included in the source tarball for hdparm-7.7, as follows: make_bad_sector --readback /dev/sda 474507 (when invoked as above, it does *not* make a bad sector; no worries). If it reports an I/O error consistently on that, then the sector is indeed faulty, and it's contents have long been lost. You can repair the bad sector (but not the original contents) like this: make_bad_sector --rewrite /dev/sda 474507 Cheers - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Zan Lynx wrote: On Mon, 2008-01-28 at 11:50 -0500, Calvin Walton wrote: On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote: On Monday 28 January 2008, Mikael Pettersson wrote: Unfortunately we also see: [ 48.285456] nvidia: module license 'NVIDIA' taints kernel. [ 48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI 19 (level, high) - IRQ 20 [ 48.550149] NVRM: loading NVIDIA UNIX x86 Kernel Module 169.07 Thu Dec 13 18:42:56 PST 2007 We have no way of debugging that module, so please try 2.6.24 without it. Sorry, I can't do this and have a working machine. The nv driver has suffered bit rot or something since the FC2 days when it COULD run a 19 crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg compressed to 10%. The system is not usable on a day to basis without the nvidia driver. You should probably give the nouveau[1] driver a try, if only for testing purposes; if you are running an NV4x (G6x or G7x) card in particular, it works a lot better than the nv driver for 2d support. 1. http://nouveau.freedesktop.org/wiki/InstallNouveau But nouveau is much less stable than nv. For testing purposes, go with stable. I believe at this point, its moot. I captured quite a few instances of that error message while rebooting the last time, all of which occurred long before I logged in and did a startx (I boot to runlevel 3 here), so the kernel was NOT tainted at that point. That dmesg has been posted and some questions asked. As this has gone on for a while, it seems to me that with 14,800 google hits on this problem, Linus should call a halt until this is found and fixed. But I'm not Linus. I'm also locking up for 30 at a time, probably ready for reboot #7 today. I'm not sure why it won't run his screen though. I can use nv to run a 1920x1200 laptop LCD. It *is* dog slow (although nouveau was not any better with a NV17 / 440-Go -- render support for AA fonts seems to be missing), but it does work. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) There cannot be a crisis next week. My schedule is already full. -- Henry Kissinger - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Gene Heskett wrote: I believe at this point, its moot. I captured quite a few instances of that error message while rebooting the last time, all of which occurred long before I logged in and did a startx (I boot to runlevel 3 here), so the kernel was NOT tainted at that point. That dmesg has been posted and some questions asked. As this has gone on for a while, it seems to me that with 14,800 google hits on this problem, Linus should call a halt until this is found and fixed. But I'm not Linus. I'm also locking up for 30 at a time, probably ready for reboot #7 today. Can you switch back to old IDE to get your work done (and to make sure it's not a hardware issue that's developed recently)? I believe libata is just a whole lot pickier about behavior than the IDE subsystem was, so it's more likely to complain about stuff, both for good reasons and when it shouldn't, and there are a slew of potential we have to accept that old PATA hardware does this bugs that all have the same symptom of we go into error handling when nothing is actually wrong, hence the vast quantity of hits. I think it's not exactly that it's a common problem as that it's a lot of problems that aren't very distinguishable. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Daniel Barkalow wrote: On Mon, 28 Jan 2008, Richard Heck wrote: Daniel Barkalow wrote: Can you switch back to old IDE to get your work done (and to make sure it's not a hardware issue that's developed recently)? I think it'd be really, REALLY helpful to a lot of people if you, or someone, could explain in moderate detail how this might be done. I tried doing it myself, but I'm not sufficiently expert at configuring kernels that I was ever able to figure out how to do it. As far as configuring the kernel, I can help: Go to Device Drivers, ATA/ATAPI/MFM/RLL support, and turn on anything that looks relevant; go to Device Drivers, Serial ATA and Parallel ATA drivers, and turn off anything that's PATA and looks relevant. Done. (Whether a device uses IDE or PATA depends on which driver that supports the device is present and find it first, not on any sort of global configuration, which is probably what tripped you up) Building this and installing it along with the appropriate initrd (which might be handled by Fedora's install scripts) Or mine, which I've been using for years. will either get you back to old IDE or will make your kernel panic on boot, depending on whether you got it right (so make sure you can still boot the kernel you're sure of or something from a boot disk). This will also cause your hard drives to show up as different device nodes, so if your boot process doesn't mount by disk uuid but by some other feature (and I don't know what Fedora does), you'll also need to change it to something either stable across access methods or which works for the one you're now using. It mounts by LABEL=. All of it. Obviously, the short version is: switch back to Fedora 6. But this kind of problem with libata---and yes, you're almost surely right that it's not one problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be really welcome and, I'm guessing, widely used. Fedora really ought to provide documentation, because there's some distro-specific stuff (like how you deal with the kernel's device node for the root partition changing), and they're using code by default that's at least somewhat documented as experimental (although it doesn't seem to be actually marked as experimental in all cases). Fedora is not the only people having trouble, name a distro, its probably someplace in that 14,800 hit google returns. -Daniel *This .sig left intentionally blank* Thanks Daniel, try #1 is building now. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Those who do not understand Unix are condemned to reinvent it, poorly. -- Henry Spencer - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Gene Heskett wrote: On Monday 28 January 2008, Daniel Barkalow wrote: Building this and installing it along with the appropriate initrd (which might be handled by Fedora's install scripts) Or mine, which I've been using for years. You're ahead of a surprising number of people, including me, if you understand making initrds. will either get you back to old IDE or will make your kernel panic on boot, depending on whether you got it right (so make sure you can still boot the kernel you're sure of or something from a boot disk). This will also cause your hard drives to show up as different device nodes, so if your boot process doesn't mount by disk uuid but by some other feature (and I don't know what Fedora does), you'll also need to change it to something either stable across access methods or which works for the one you're now using. It mounts by LABEL=. All of it. That'll save a huge amount of hassle. So long as you manage to get the right drivers included and the wrong drivers not included, you should be pretty much set. Fedora is not the only people having trouble, name a distro, its probably someplace in that 14,800 hit google returns. Yeah, but they each may need different instructions, particularly if they're not mounting by label in general, or not mounting the root partition by label. That was the big hassle going the opposite direction. And the procedure is 4 lines to describe to somebody who knows how to build and install a new kernel for the distro, which is much shorter than the explanation of how you generally build and install a kernel. A real howto would have to explain where to get the distro's kernel sources and default configuration, for example. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Daniel Barkalow wrote: On Mon, 28 Jan 2008, Gene Heskett wrote: On Monday 28 January 2008, Daniel Barkalow wrote: Building this and installing it along with the appropriate initrd (which might be handled by Fedora's install scripts) Or mine, which I've been using for years. You're ahead of a surprising number of people, including me, if you understand making initrds. In my script, its one line: mkinitrd -f initrd-$VER.img $VER \ where $VER is the shell variable I edit to = the version number, located at the top of the script. Unforch, its failing: No module pata_amd found for kernel 2.6.24, aborting. This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned on. So something is still dependent on it. I do have one sata drive, on an accessory card in the box, so I need the rest of the sata_sil and friends stuff. Its my virtual tapes for amanda. Also home built, the amanda security model cannot be successfully bent into the shape of an rpm. They BTW are #2 on coverity's list of most secure software. So I've rebuilt 2.6.24 as it originally was, and added the acpi timer line to the 2.6.24-rc8 stanza's kernel argument list. It will boot one or the other when I next reboot. Its been about 8 hours since the last error was logged, which is totally weirdsville to this old fart. Phase of the moon maybe? The visit to the sawbones to see about my heart? They are going to fit me with a 30 day recorder tomorrow, my skip a beat problem is getting worse. The sort of stuff that goes with the 7nth decade I guess. Officially, I'm wearing out me, too much sugar, too many times nearly electrocuted=shingles yadda yadda. :-) Oh, and don't forget Arther, he moved in uninvited about 25 years ago too. Those people that talk about the golden years? They're full of excrement... will either get you back to old IDE or will make your kernel panic on boot, depending on whether you got it right (so make sure you can still boot the kernel you're sure of or something from a boot disk). This will also cause your hard drives to show up as different device nodes, so if your boot process doesn't mount by disk uuid but by some other feature (and I don't know what Fedora does), you'll also need to change it to something either stable across access methods or which works for the one you're now using. It mounts by LABEL=. All of it. That'll save a huge amount of hassle. So long as you manage to get the right drivers included and the wrong drivers not included, you should be pretty much set. Fedora is not the only people having trouble, name a distro, its probably someplace in that 14,800 hit google returns. Yeah, but they each may need different instructions, particularly if they're not mounting by label in general, or not mounting the root partition by label. That was the big hassle going the opposite direction. And the procedure is 4 lines to describe to somebody who knows how to build and install a new kernel for the distro, which is much shorter than the explanation of how you generally build and install a kernel. A real howto would have to explain where to get the distro's kernel sources and default configuration, for example. -Daniel *This .sig left intentionally blank* -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Never drink from your finger bowl -- it contains only water. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008, Gene Heskett wrote: On Monday 28 January 2008, Daniel Barkalow wrote: On Mon, 28 Jan 2008, Gene Heskett wrote: On Monday 28 January 2008, Daniel Barkalow wrote: Building this and installing it along with the appropriate initrd (which might be handled by Fedora's install scripts) Or mine, which I've been using for years. You're ahead of a surprising number of people, including me, if you understand making initrds. In my script, its one line: mkinitrd -f initrd-$VER.img $VER \ where $VER is the shell variable I edit to = the version number, located at the top of the script. Unforch, its failing: No module pata_amd found for kernel 2.6.24, aborting. This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned on. So something is still dependent on it. That looks like something in the guts of the initrd; it probably thinks you need pata_amd and it's unhappy that you don't have it. Actually, another thing to try is making the ATA/etc one be y and pata_amd be m. Most likely, this should lead to the ATA one claiming the drive before the module is loaded (but the module would be loaded later, to avoid upsetting the initrd); you should be able to tell from dmesg (or /dev, for that matter) which one got it, and I think built-in drivers will claim everything they can before an initrd gets loaded. I do have one sata drive, on an accessory card in the box, so I need the rest of the sata_sil and friends stuff. Assuming it isn't picking up your hard drive, which it isn't, that shouldn't matter. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
Gene Heskett wrote: .. That's ok, dd seemed to do the job also. .. The two programs operate entirely differently from each other, so it may still be worth trying the make_bad_sector utility there. dd goes through the regular kernel I/O calls, whereas make_bad_sector sends raw ATA commands directly (more or less) to the drive. -ml - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote: On Monday 28 January 2008, Mikael Pettersson wrote: Gene Heskett writes: On Monday 28 January 2008, Peter Zijlstra wrote: On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote: 1. Wrong mailing list; use linux-ide (@vger) instead. What, and keep all us other interested people in the dark? As a test, I tried rebooting to the latest fedora kernel and found it kills X, so I'm back to the second to last fedora version ATM, and the third 'smartctl -t lng /dev/sda' in 24 hours is running now. The first two completed with no errors. I've added the linux-ide list to refresh those people of the problem, the logs are being spammed by this message stanza: Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel: [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA It's not obvious from this incomplete dmesg log what HW or driver is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one, it should be pata_amd driving a WDC disk: [ 30.702887] pata_amd :00:09.0: version 0.3.10 [ 30.703052] PCI: Setting latency timer of device :00:09.0 to 64 [ 30.703188] scsi0 : pata_amd [ 30.709313] scsi1 : pata_amd [ 30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 14 [ 30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 15 [ 30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0, 15.05R15, max UDMA/100 [ 30.864756] ata1.00: 390721968 sectors, multi 16: LBA48 [ 30.871629] ata1.00: configured for UDMA/100 Unfortunately we also see: [ 48.285456] nvidia: module license 'NVIDIA' taints kernel. [ 48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI 19 (level, high) - IRQ 20 [ 48.550149] NVRM: loading NVIDIA UNIX x86 Kernel Module 169.07 Thu Dec 13 18:42:56 PST 2007 We have no way of debugging that module, so please try 2.6.24 without it. Sorry, I can't do this and have a working machine. The nv driver has suffered bit rot or something since the FC2 days when it COULD run a 19 crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg compressed to 10%. The system is not usable on a day to basis without the nvidia driver. Fix the nv driver so it will run this screen at its native resolution and I'll be glad to run it even if it won't run google earth, which I do use from time to time. Now, if in all the hits you can get from google on this, currently 14,800 just for 'exception Emask', apparently caused by a timeout, if 100% of the complainers are running nvidia drivers also, then I see a legit I can invalidate this theory... i helped a guy on irc debug this problem, and he had ati. I tried having him stop using fglrx, and go to r300.. same problem, and same problem even with vesa.. :) also, i have this on my fileserver with .20, which doesent even run X, or module support in kernel :) complaint. Again, fix the nv driver so it will run my screen I'll be glad to switch. I can see the reason, sure, but the machine must be capable of doing its common day to day stuff, while using that driver, like running kde for kmail, and browsers that work. If the problems persist, please try to capture a complete log from the failing kernel -- the interesting bits are everything from initial boot up to and including the first few errors. You may need to increase the kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT). If by log you mean /var/log/messages, I have several megabytes of those. If you mean a live dmesg capture taken right now, its attached. It contains several of these at the bottom. I long ago made the kernel log buffer bigger, cuz it couldn't even show the start immediately after the boot, and even the dump to syslog was truncated. There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final. That is what I was afraid of. I've done
Re: Problem with ata layer in 2.6.24
On Monday 28 January 2008, Kasper Sandberg wrote: [...] We have no way of debugging that module, so please try 2.6.24 without it. Sorry, I can't do this and have a working machine. The nv driver has suffered bit rot or something since the FC2 days when it COULD run a 19 crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg compressed to 10%. The system is not usable on a day to basis without the nvidia driver. Fix the nv driver so it will run this screen at its native resolution and I'll be glad to run it even if it won't run google earth, which I do use from time to time. Now, if in all the hits you can get from google on this, currently 14,800 just for 'exception Emask', apparently caused by a timeout, if 100% of the complainers are running nvidia drivers also, then I see a legit I can invalidate this theory... i helped a guy on irc debug this problem, and he had ati. I tried having him stop using fglrx, and go to r300.. same problem, and same problem even with vesa.. :) No Kasper, you are validating it, that it is not nvidia related, which is what I was also saying. also, i have this on my fileserver with .20, which doesent even run X, or module support in kernel :) That far back? Although ISTR I saw it happen once only when I was running 2.6.18-somethingorother. complaint. Again, fix the nv driver so it will run my screen I'll be glad to switch. I can see the reason, sure, but the machine must be capable of doing its common day to day stuff, while using that driver, like running kde for kmail, and browsers that work. If the problems persist, please try to capture a complete log from the failing kernel -- the interesting bits are everything from initial boot up to and including the first few errors. You may need to increase the kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT). If by log you mean /var/log/messages, I have several megabytes of those. If you mean a live dmesg capture taken right now, its attached. It contains several of these at the bottom. I long ago made the kernel log buffer bigger, cuz it couldn't even show the start immediately after the boot, and even the dump to syslog was truncated. There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final. That is what I was afraid of. I've done some limited grepping in that branch of the kernel tree, and cannot seem to locate where this EH handler is being invoked from. There is 2 lines of interest in the dmesg: [0.00] Nvidia board detected. Ignoring ACPI timer override. [0.00] If you got timer trouble try acpi_use_timer_override But I have NDI what it means, kernel argument/xconfig option? I've also done some googling, and it appears this problem is fairly widespread since the switchover to libata was encouraged. A stock fedora F8 kernel suffers the same freezes and eventually locks up, but does it without the error messages being logged, it just freezes, feeling identical to this in the minutes before the total freeze. I've tried 2 of those too, but the newest one won't even run X. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) bureaucrat, n: A politician who has tenure. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Mon, 2008-01-28 at 23:49 -0500, Gene Heskett wrote: On Monday 28 January 2008, Kasper Sandberg wrote: [...] snip I can invalidate this theory... i helped a guy on irc debug this problem, and he had ati. I tried having him stop using fglrx, and go to r300.. same problem, and same problem even with vesa.. :) No Kasper, you are validating it, that it is not nvidia related, which is what I was also saying. yeah thats what i mean - i can invalidate the theory that all the affected boxes run nvidia. also, i have this on my fileserver with .20, which doesent even run X, or module support in kernel :) That far back? Although ISTR I saw it happen once only when I was running 2.6.18-somethingorother. Yes im afraid so.. i will now provide some complete details, as i feel they are relevant. the thing is, i run 6x300gb disks, IDE, in raid5. i have both an onboard via ide controller, and then i bought a promise pdc 202 new thingie. i had problem however.. after a bit of time, i would get DMA reset error thing, and it all kindof went NUTS. it was as if all data access were skewed, and as you might imagine, this made everything fail badly. i purchased an ITE based controller for the drives on the promise, but exactly the same thing happened. the errors i got was: hdf: dma_intr: bad DMA status (dma_stat=75) hdf: dma_intr: status=0x50 { DriveReady SeekComplete } ide: failed opcode was: unknown --- i then found new hope, when i heard that libata provided much better error handling, so i upgraded to .20. this made my box usable. the error happens once or twice a day, the disk led will turn on constantly, and all IO freezes for about half a minute, where it returns PROPERLY(thank you libata!). as far as i can tell, the only side effect is that i get those messages like described here, and flooded with on google. to put some timeline perspective into this. i believe it was in 2005 i assembled the system, and when i realized it was faulty, on old ide driver, i stopped using it - that miht have been in beginning of 2006. then for almost a year i werent using it, hoping to somehow fix it, but in january 2007 i think it was, atleast in the very beginning of 2007, i hit upon the idea of trying libata, and ever since the system has been running 24/7 - doing these errors around 2 times a day. i have multiple times reported my problems to lkml, but nothing has happened, i also tried to aproeach jgarzik direcly, but he was not interested. i really hope this can be solved now, its a huge problem my fileserver has an asus k8v motherboard, with via chipset (k8t880 i think it is, or something like it). currently using the promise controller again(strangely enough all the timeouts seems to happen here, and when the ITE was on, there, not the onboard one), in conjunction with the onboard via. complaint. Again, fix the nv driver so it will run my screen I'll be snip - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Mon, Jan 28, 2008 at 08:31:57PM -0500, Gene Heskett wrote: In my script, its one line: mkinitrd -f initrd-$VER.img $VER \ where $VER is the shell variable I edit to = the version number, located at the top of the script. Unforch, its failing: No module pata_amd found for kernel 2.6.24, aborting. mkinitrd is just a shell script. Even if its options, and there is a quite a number of these, do not allow to influence a choice of modules in a desired manner, it is pretty trivial to make yourself a custom version of it and just hardwire there a fixed list of modules to use instead of relying on general mechanisms which are trying hard to guess what you may need. That way your regular 'mkinitrd' will build something to boot with libata and 'mkinird.ide' will use IDE modules for that purpose using the same core kernel. If you are using distribution kernels, as opposed to your own configuration, it is quite likely that you will need to install 'kernel-devel' package and recompile and add required IDE modules yourself as those may be not provided. This is done the same way like for any other external module. Michal - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with ata layer in 2.6.24
On Mon, 28 Jan 2008 14:13:21 -0500 Gene Heskett [EMAIL PROTECTED] wrote: I had to reboot early this morning due to a freezeup, and I had a bunch of these in the messages log: == Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 19:42:11 coyote kernel: [42461.915973] ata1.00: cmd ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma 4096 out Jan 27 19:42:11 coyote kernel: [42461.915974] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 19:42:11 coyote kernel: [42461.915978] ata1.00: status: { DRDY } Jan 27 19:42:11 coyote kernel: [42461.916005] ata1: soft resetting link Jan 27 19:42:12 coyote kernel: [42462.078216] ata1.00: configured for UDMA/100 Jan 27 19:42:12 coyote kernel: [42462.078232] ata1: EH complete Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB) Jan 27 19:42:12 coyote kernel: [42462.114230] sd 0:0:0:0: [sda] Write Protect is off Jan 27 19:42:12 coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA === I had this error too, or maybe only a similar one, and another, neither of which of i still have the error output laying around, so I'm posting both fixes, that i found here on lkml: 1) disabling ncq like that: echo 1 /sys/block/sda/device/queue_depth 2) this patch: libata_drain_fifo_on_stuck_drq_hsm.patch ( applies to 2.6.24 too ) Signed-off-by: Mark Lord [EMAIL PROTECTED] --- --- old/drivers/ata/libata-sff.c2007-09-28 09:29:22.0 -0400 +++ linux/drivers/ata/libata-sff.c 2007-09-28 09:39:44.0 -0400 @@ -420,6 +420,28 @@ ap-ops-irq_on(ap); } +static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc) +{ + u8 stat = ata_chk_status(ap); + /* +* Try to clear stuck DRQ if necessary, +* by reading/discarding up to two sectors worth of data. +*/ + if ((stat ATA_DRQ) (!qc || qc-dma_dir != DMA_TO_DEVICE)) { + unsigned int i; + unsigned int limit = qc ? qc-sect_size : ATA_SECT_SIZE; + + printk(KERN_WARNING Draining up to %u words from data FIFO.\n, + limit); + for (i = 0; i limit ; ++i) { + ioread16(ap-ioaddr.data_addr); + if (!(ata_chk_status(ap) ATA_DRQ)) + break; + } + printk(KERN_WARNING Drained %u/%u words.\n, i, limit); + } +} + /** * ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller * @ap: port to handle error for @@ -476,7 +498,7 @@ } ata_altstatus(ap); - ata_chk_status(ap); + ata_drain_fifo(ap, qc); ap-ops-irq_clear(ap); spin_unlock_irqrestore(ap-lock, flags); - -- Florian Attenberger [EMAIL PROTECTED] pgpaqRPEbjtUv.pgp Description: PGP signature