Re: sata_promise SATA300TX4 intermittent problems
Hello, Peter Favrholdt wrote: My feeling is this is not caused by 1.5Gbps or 3.0Gbps operation. ...snip My next test will be a plain 2.6.21rc2. Then I'll apply the patches one by one. I've tested 2.6.21-rc2 which fails (sdc down after 27 minutes sdd down after 46 minutes). Then I applied just a single patch to 2.6.21-rc2: Mikael Petterssons patch to force 1.5Gbps operation and tested again - this time no problems at all! (BTW: both kernels are running with IO-APIC disabled). I've put results+dmesg output here: http://sata300tx4.gratiswiki.dk/ I had the oppoturnity to test Mikael's 1,5Gbps patch yesterday evening and although the system seems to run OK, I still do get the following system log messages: Mar 13 06:10:22 alderan kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Mar 13 06:10:22 alderan kernel: ata2.00: cmd c8/00:30:0f:e2:86/00:00:00:00:00/e7 tag 0 cdb 0x0 data 24576 in Mar 13 06:10:22 alderan kernel: res 50/00:00:3e:e2:86/00:00:00:00:00/e7 Emask 0x1 (device error) Mar 13 06:10:22 alderan kernel: ata2.00: configured for UDMA/133 Mar 13 06:10:22 alderan kernel: ata2: EH complete Mar 13 06:10:22 alderan kernel: SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB) Mar 13 06:10:22 alderan kernel: sdb: Write Protect is off Mar 13 06:10:22 alderan kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 13 06:11:23 alderan kernel: possible SYN flooding on port 52223. Sending cookies. Mar 13 06:13:05 alderan kernel: possible SYN flooding on port 52223. Sending cookies. Mar 13 06:13:23 alderan kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Mar 13 06:13:23 alderan kernel: ata2.00: cmd 25/00:00:27:29:73/00:02:07:00:00/e0 tag 0 cdb 0x0 data 262144 in Mar 13 06:13:23 alderan kernel: res 50/00:00:26:2b:73/00:00:00:00:00/e0 Emask 0x1 (device error) Mar 13 06:13:23 alderan kernel: ata2.00: configured for UDMA/133 Mar 13 06:13:23 alderan kernel: ata2: EH complete Mar 13 06:13:23 alderan kernel: SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB) Mar 13 06:13:23 alderan kernel: sdb: Write Protect is off Mar 13 06:13:23 alderan kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA If Mikael does have an updated patch for the more detailed error reporting features, I'll try to run it in a few days of time whenever I get my hands on it. I would be really interested to know why the Promise Sata300TX4 doesn't play along the newer 500GB Seagate 7200.10 disks while the older models are Ok (I've already tried with and without 1,5Gbps jumpers and patches). Regards, Tomi Orava -- - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libata: fix native mode disabled port handling
Jeff Garzik wrote: Tejun Heo wrote: Disabled port handling in ata_pci_init_native_mode() is slightly broken in that it may end up using the wrong port_info. This patch updates it such that disables ports are made dummy as done in the legacy and other cases. While at it, fix indentation in ata_resources_present(). Signed-off-by: Tejun Heo [EMAIL PROTECTED] --- drivers/ata/libata-sff.c | 62 ++ 1 files changed, 35 insertions(+), 27 deletions(-) what's the extent of the breakage here? I would rather push this into #upstream Yeap, this is for #upstream. The breakage is theoretical as none hits the path yet. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libata: add support for READ/WRITE LONG
Tejun Heo wrote: Mark Lord wrote: The READ/WRITE LONG commands are theoretically obsolete, but the majority of drives in existance still implement them. The WRITE_LONG and WRITE_LONG_ONCE commands are of particular interest for fault injection testing -- eg. creating media errors at specific locations on a disk. The fussy bit is that these commands require a non-standard sector size, usually 520 bytes instead of 512. This patch adds support to libata for READ/WRITE LONG commands issued via SG_IO/ATA_16. This patch was generated against a 2.6.21-rc3-git7 base: I think it would be better if this comes in two patches. One to add qc-sect_size and convert all users of ATA_SECT_SIZE to qc-sect_size and the other one to implement READ/WRITE LONG. Another question is whether this needs to be included into mainline. This is definitely useful but it is mostly for debugging/testing. Hmmm... But we're gonna need qc-sect_size anyway for devices with larger sector sizes and overhead for supporting READ/WRITE LONG is nearly nill, so I'm voting for inclusion. Thanks. I just want to add that this patch has been incredibly useful for us in testing the error handling RAID. Nothing like real media errors on demand to validate your assumptions ;-) ric - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression: ide-floppy doesn't work with IOMEGA IDE ZIP drive
Hi, On Monday 12 March 2007, Sergei Shtylyov wrote: Hello. Tejun Heo wrote: Stanislav Brabec reported that IOMEGA IDE ZIP drive doesn't work with recent kernels. Low level driver is via82cxxx. Relevant part of 2.6.20.1 boot message follows. VP_IDE: IDE controller at PCI slot :00:11.1 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt8233 (rev 00) IDE UDMA100 controller on pci:00:11.1 ide0: BM-DMA at 0xff00-0xff07, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xff08-0xff0f, BIOS settings: hdc:DMA, hdd:DMA Probing IDE interface ide0... hda: ST3160812A, ATA DISK drive hdb: IOMEGA ZIP 100 ATAPI, ATAPI FLOPPY drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ... hdb: lost interrupt hdb: status error: status=0x00 { } ide: failed opcode was: unknown ide-floppy: Strange, packet command initiated yet DRQ isn't asserted ... hdb: 98304kB, 96/64/32 CHS, 4096 kBps, 512 sector size, 2941 rpm hdb: No disk in drive hdb: lost interrupt hdb: status error: status=0x00 { } ide: failed opcode was: unknown ide-floppy: Strange, packet command initiated yet DRQ isn't asserted [above repeats several times] ... hdb: lost interrupt hdb: status error: status=0x00 { } ide: failed opcode was: unknown ide-floppy: Strange, packet command initiated yet DRQ isn't asserted hdb: 98304kB, 196608 blocks, 512 sector size hdb: unknown partition table hdb: unknown partition table hdb: unknown partition table hdb: unknown partition table And the device is inaccessible after boot completed. On suse 10.1 kernel (2.6.16 based), it works better. VP_IDE: IDE controller at PCI slot :00:11.1 PCI: VIA IRQ fixup for :00:11.1, from 255 to 0 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt8233 (rev 00) IDE UDMA100 controller on pci:00:11.1 ide0: BM-DMA at 0xff00-0xff07, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xff08-0xff0f, BIOS settings: hdc:DMA, hdd:DMA Probing IDE interface ide0... hda: ST3160812A, ATA DISK drive hdb: IOMEGA ZIP 100 ATAPI, ATAPI FLOPPY drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ... hdb: No disk in drive hdb: 98304kB, 96/64/32 CHS, 4096 kBps, 512 sector size, 2941 rpm ... hdb: lost interrupt hdb: status error: status=0x00 { } ide: failed opcode was: unknown ide-floppy: Strange, packet command initiated yet DRQ isn't asserted hdb: 98304kB, 196608 blocks, 512 sector size hdb: unknown partition table There is one lost interrupt message but the drive reportedly works fine after that. Stanislav also seems to recall that ide-floppy worked without any error message with older kernel. I'm attaching full boot log messages for 2.6.20.1 and suse 10.1. Any ideas? BTW... I've looked at that code last spring and found it strange that ide-floopy is the only driver that still calls dma_start() method *before* issuing a command *while this is not a right thing to do accoring to spec and is known to not work with some chips, namely Promise). I was going to send a patch then but lacking both time and actual hardware, kept deferring it since... :-) We are probably hitting two bugs here: * regression between 2.6.16-2.6.20 * the issue that Sergei described Stanislav, could you use git bisect to narrow down the problem to the specific patch? Good practical example of using git-bisect is here: http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ Thanks, Bart - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression: ide-floppy doesn't work with IOMEGA IDE ZIP drive
Jeff Garzik wrote: Tejun Heo wrote: [libata] And, as the device requires custom high level driver, libata fails miserably. Would it be worth to try support these devices? Or are they just too outdated to put the effort in? What SCSI peripheral device type does it report, when booted under libata? Internal IOMEGA ZIP 100 IDE (manufactured by NEC). ata1.01: ATAPI, max PIO2, CDB intr ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 ata1.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x12 data 36 in res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation) ata1: soft resetting port ata1.01: configured for PIO2 ata1: EH complete ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 ata1.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x12 data 36 in res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation) ata1.01: configured for PIO2 ata1: EH complete ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 ata1.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x12 data 36 in res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation) ata1: soft resetting port ... and so on For more see https://bugzilla.novell.com/show_bug.cgi?id=232086 (complete ide-floppy and libata logs are there) -- Best Regards, Stanislav Brabec software developer - SUSE LINUX, s. r. o. e-mail: [EMAIL PROTECTED] Lihovarska 1060/12tel: +420 284 028 966 190 00 Praha 9fax: +420 284 028 951 Czech Republichttp://www.suse.cz/ - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Timeout error, disk gone
On Tue, 13 Mar 2007 08:31:55 + (UTC) Matthias Urlichs [EMAIL PROTECTED] wrote: Transient glitch? Major ugliness? For the time being I have not re-added the thing to my RAID, but the three other disks in it are the exact same model... What model and what firmware ? There are some problem firmware releases around with older SATA drives - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression: ide-floppy doesn't work with IOMEGA IDE ZIP drive
It seems ide-floppy needs some special handlings in interrupt handling too like delaying data transfer by several ticks after device indicates readiness. Apart from separate high level driver, we might have to modify libata HSM implementation if we're gonna support these devices. The data transfer delay may well be down to the DMA bug Can someone more knowledgeable explain what needs to be done differently from standard ATAPI for these devices? In theory not a lot if anything. I don't have a ZIP drive but have got an old Iomega Clik! PCMCIA drive somewhere if you need an ide-floppy device Tejun. Alan - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Timeout error, disk gone
Hi, Alan Cox: What model and what firmware ? There are some problem firmware releases around with older SATA drives Samsung SP2004C, firmware version unknown -- how do I find out? -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [EMAIL PROTECTED] Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - It is all right to hold a conversation, but you should let go of it now and then. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Timeout error, disk gone
On Tue, 13 Mar 2007 14:21:15 +0100 [EMAIL PROTECTED] wrote: Hi, Alan Cox: What model and what firmware ? There are some problem firmware releases around with older SATA drives Samsung SP2004C, firmware version unknown -- how do I find out? Its in the identify data, however I'm not aware of any problem Samsung drives in reports so far. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Timeout error, disk gone
[EMAIL PROTECTED] wrote: Hi, Alan Cox: What model and what firmware ? There are some problem firmware releases around with older SATA drives Samsung SP2004C, firmware version unknown -- how do I find out? Please post the result of 'lspci -nn' and full boot log. Samsung firmwares tend to be pretty good but I've seen earlier ones occasionally lock up after certain PHY events. Removing power and reapplying puts it back into sane state. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Timeout error, disk gone
Tejun Heo wrote: [EMAIL PROTECTED] wrote: Hi, Alan Cox: What model and what firmware ? There are some problem firmware releases around with older SATA drives Samsung SP2004C, firmware version unknown -- how do I find out? Please post the result of 'lspci -nn' and full boot log. Samsung firmwares tend to be pretty good but I've seen earlier ones occasionally lock up after certain PHY events. Removing power and reapplying puts it back into sane state. Oh and I'm pretty sure it was the drive which locked up. If you connect the harddrive to different working port without removing power, the same timeouts happen while the original port detects and works fine with another hotplugged drive. And for the record, among the earlier SATA drives, samsung ones were definitely in the better group. Also, please include full log of failure. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression: ide-floppy doesn't work with IOMEGA IDE ZIP drive
Alan Cox wrote: It seems ide-floppy needs some special handlings in interrupt handling too like delaying data transfer by several ticks after device indicates readiness. Apart from separate high level driver, we might have to modify libata HSM implementation if we're gonna support these devices. The data transfer delay may well be down to the DMA bug I see. Can someone more knowledgeable explain what needs to be done differently from standard ATAPI for these devices? In theory not a lot if anything. I don't have a ZIP drive but have got an old Iomega Clik! PCMCIA drive somewhere if you need an ide-floppy device Tejun. If you've got a spare one, that would be great but otherwise I think you're much better qualified for libata ide-floppy support. :-) Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Timeout error, disk gone
Hi, Alan Cox: Samsung SP2004C, firmware version unknown -- how do I find out? Its in the identify data, however I'm not aware of any problem Samsung drives in reports so far. Looks like a hardware problem. After a hard powerdown got it recognizable again, it now manages to have a read speed of 1/2 MB/sec -- as opposed to its three brethren which are approx. 111 times faster. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [EMAIL PROTECTED] Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - Brisk talkers are usually slow thinkers. There is, indeed, no wild beast more to be dreaded than a communicative man having nothing to communicate. If you are civil to the voluble, they will abuse your patience; if brusque, your character. -- Jonathan Swift - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Timeout error, disk gone
Hi, Tejun Heo: Samsung SP2004C, firmware version unknown -- how do I find out? Please post the result of 'lspci -nn' and full boot log. Samsung firmwares tend to be pretty good but I've seen earlier ones occasionally lock up after certain PHY events. Removing power and reapplying puts it back into sane state. Not this one, as reported in my earlier mail -- unless you redefine sane. I'll try to elucidate something from it after it's replaced. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [EMAIL PROTECTED] Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - When the polls are overwhelmingly unfavorable, (a) ridicule and dismiss them or (b) stress the volatility of public opinion. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: libata extension
Vitaliyi wrote: Why is the access to Control register needed? To execute soft reset for example. In the perfect case i would like to be able to execute vendor command set (reverse engineered). Sounds interesting. :-) Could you give some more details on what are you going to implement? Reading/writing service area, uploading, downloading modules, working with flash etc. SAT (aka ATA passthru) defines how to do soft-reset. SG_IO supports the ATA_12 and ATA_16 commands which permit soft-reset and similar tasks. libata supports this interface, but does not yet support soft-reset and similar non-comment-oriented tasks. This would be the best area to add such features, though. Jeff - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/6] 2.6.21-rc2: known regressions
Can you apply the attached patch and report what the kernel says with ACPI turned on? -- tejun diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c index 019d8ff..6a27a7f 100644 --- a/drivers/ata/libata-acpi.c +++ b/drivers/ata/libata-acpi.c @@ -473,8 +473,8 @@ static void taskfile_load_raw(struct ata_port *ap, struct ata_taskfile tf; unsigned int err; - if (ata_msg_probe(ap)) - ata_dev_printk(atadev, KERN_DEBUG, %s: (0x1f1-1f7): hex: + if (1 || ata_msg_probe(ap)) + ata_dev_printk(atadev, KERN_INFO, %s: (0x1f1-1f7): hex: %02x %02x %02x %02x %02x %02x %02x\n, __FUNCTION__, gtf-tfa[0], gtf-tfa[1], gtf-tfa[2],
[5/6] 2.6.21-rc3: known regressions
This email lists some known regressions in Linus' tree compared to 2.6.20. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject: resume: slab error in verify_redzone_free(): cache `size-512': memory outside object was overwritten References : http://lkml.org/lkml/2007/2/24/41 Submitter : Pavel Machek [EMAIL PROTECTED] Status : unknown Subject: beeps get longer after suspend References : http://lkml.org/lkml/2007/2/26/276 Submitter : Pavel Machek [EMAIL PROTECTED] Status : unknown Subject: suspend/resume hangs until keypress References : http://bugzilla.kernel.org/show_bug.cgi?id=8181 Submitter : Tomas Janousek [EMAIL PROTECTED] Status : unknown Subject: SATA breakage on resume References : http://lkml.org/lkml/2007/3/7/233 Submitter : Thomas Gleixner [EMAIL PROTECTED] Soeren Sonnenburg [EMAIL PROTECTED] Status : unknown Subject: first disk access after resume takes several minutes References : http://lkml.org/lkml/2007/3/8/117 Submitter : Michael S. Tsirkin [EMAIL PROTECTED] Status : unknown Subject: after resume: X hangs after drawing a couple of windows References : http://lkml.org/lkml/2007/3/8/117 Submitter : Michael S. Tsirkin [EMAIL PROTECTED] Status : unknown Subject: ThinkPad Z60m: usb mouse stops working after suspend to ram References : http://lkml.org/lkml/2007/2/21/413 http://lkml.org/lkml/2007/2/28/172 Submitter : Arkadiusz Miskiewicz [EMAIL PROTECTED] Caused-By : Konstantin Karasyov [EMAIL PROTECTED] commit 0a6139027f3986162233adc17285151e78b39cac Handled-By : Konstantin Karasyov [EMAIL PROTECTED] Status : problem is being debugged Subject: suspend to disk breaks ACPI References : http://lkml.org/lkml/2007/3/5/127 Submitter : Lukas Hejtmanek [EMAIL PROTECTED] Status : unknown - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[3/6] 2.6.21-rc3: known regressions
This email lists some known regressions in Linus' tree compared to 2.6.20. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject: AMD Elan: Crash after Allocating PCI resources References : http://bugzilla.kernel.org/show_bug.cgi?id=8161 Submitter : Vladimir Brik [EMAIL PROTECTED] Handled-By : Andi Kleen [EMAIL PROTECTED] Status : problem is being debugged Subject: x86_64: boot hangs unless CONFIG_PCIEPORTBUS=n and acpi=off References : http://bugzilla.kernel.org/show_bug.cgi?id=8162 Submitter : Randy Dunlap [EMAIL PROTECTED] Status : unknown Subject: ACPI regression with noapic References : http://lkml.org/lkml/2007/3/8/468 Submitter : Ray Lee [EMAIL PROTECTED] Status : unknown Subject: acpi_serialize locks system during boot References : http://bugzilla.kernel.org/show_bug.cgi?id=8171 Submitter : Colchao [EMAIL PROTECTED] Status : unknown Subject: NCQ problem with ahci and Hitachi drive (ACPI related) References : http://lkml.org/lkml/2007/3/4/178 http://lkml.org/lkml/2007/3/9/475 Submitter : Mathieu Bérard [EMAIL PROTECTED] Handled-By : Tejun Heo [EMAIL PROTECTED] Status : unknown Subject: kernels fail to boot with drives on ATIIXP controller (ACPI/IRQ related) References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229621 http://lkml.org/lkml/2007/3/4/257 Submitter : Michal Jaegermann [EMAIL PROTECTED] Status : unknown Subject: libata: PATA UDMA/100 configured as UDMA/33 References : http://lkml.org/lkml/2007/2/20/294 http://www.mail-archive.com/linux-ide@vger.kernel.org/msg04115.html http://bugzilla.kernel.org/show_bug.cgi?id=8133 http://bugzilla.kernel.org/show_bug.cgi?id=8164 Submitter : Fabio Comolli [EMAIL PROTECTED] Plamen Petrov [EMAIL PROTECTED] Laurent Riffard [EMAIL PROTECTED] Handled-By : Tejun Heo [EMAIL PROTECTED] Status : patch available - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/6] 2.6.21-rc2: known regressions
Tejun Heo a écrit : Mathieu Bérard wrote: Jeff Garzik a écrit : Adrian Bunk wrote: Subject: NCQ problem with ahci and Hitachi drive References : http://lkml.org/lkml/2007/3/4/178 Submitter : Mathieu Bérard [EMAIL PROTECTED] Status : unknown according to the last message in that thread, it sounds like ACPI and interrupt problems Hi, after more testing with a 2.6.21-rc3, it appears that after several ata errors the boot process somehow continued as normal, after a NCQ disabled due to excessive errors message. pci=noacpi or noacpi parameters workarounds the problem irqpoll does nothing. I was mistaken. It can't be IRQ routing problem. I somehow thought the port was a ata_piix one. Considering the reported broken NCQ feature on the device GTF might be mangling with the drive to disable NCQ or something. Does giving libata.noacpi=1 make any difference? Hi, libata.noacpi=1 worked. The drive is up and running with NCQ on. Here is the PATA/SATA related part of my DSDT table with the _GTF methods: Device (PATA) { Name (_ADR, 0x001F0001) OperationRegion (PACS, PCI_Config, 0x40, 0xC0) Field (PACS, DWordAcc, NoLock, Preserve) { PRIT, 16, Offset (0x04), PSIT, 4, Offset (0x08), SYNC, 4, Offset (0x0A), SDT0, 2, , 2, SDT1, 2, Offset (0x14), ICR0, 4, ICR1, 4, ICR2, 4, ICR3, 4, ICR4, 4, ICR5, 4 } Device (PRID) { Name (_ADR, 0x00) Method (_GTM, 0, NotSerialized) { Name (PBUF, Buffer (0x14) { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }) CreateDWordField (PBUF, 0x00, PIO0) CreateDWordField (PBUF, 0x04, DMA0) CreateDWordField (PBUF, 0x08, PIO1) CreateDWordField (PBUF, 0x0C, DMA1) CreateDWordField (PBUF, 0x10, FLAG) Store (GETP (PRIT), PIO0) Store (GDMA (And (SYNC, 0x01), And (ICR3, 0x01), And (ICR0, 0x01), SDT0, And (ICR1, 0x01)), DMA0) If (LEqual (DMA0, 0x)) { Store (PIO0, DMA0) } If (And (PRIT, 0x4000)) { If (LEqual (And (PRIT, 0x90), 0x80)) { Store (0x0384, PIO1) } Else { Store (GETT (PSIT), PIO1) } } Else { Store (0x, PIO1) } Store (GDMA (And (SYNC, 0x02), And (ICR3, 0x02), And (ICR0, 0x02), SDT1, And (ICR1, 0x02)), DMA1) If (LEqual (DMA1, 0x)) { Store (PIO1, DMA1) } Store (GETF (And (SYNC, 0x01), And (SYNC, 0x02), PRIT), FLAG) If (And (LEqual (PIO0, 0x), LEqual (DMA0, 0x))) { Store (0x78, PIO0) Store (0x14, DMA0) Store (0x03, FLAG) } Return (PBUF) } Method (_STM, 3, NotSerialized) { CreateDWordField (Arg0, 0x00, PIO0) CreateDWordField (Arg0, 0x04, DMA0) CreateDWordField (Arg0, 0x08, PIO1) CreateDWordField (Arg0, 0x0C, DMA1) CreateDWordField (Arg0, 0x10, FLAG) If (LEqual (SizeOf (Arg1), 0x0200)) { And (PRIT, 0x40F0, PRIT) And (SYNC, 0x02, SYNC) Store (0x00, SDT0) And (ICR0, 0x02, ICR0) And (ICR1, 0x02, ICR1) And (ICR3, 0x02, ICR3) And (ICR5, 0x02, ICR5) CreateWordField (Arg1, 0x62, W490) CreateWordField (Arg1, 0x6A, W530) CreateWordField (Arg1, 0x7E, W630) CreateWordField (Arg1, 0x80, W640) CreateWordField (Arg1, 0xB0, W880) CreateWordField (Arg1, 0xBA, W930) Or (PRIT, 0x8004, PRIT) If (LAnd (And (FLAG, 0x02), And (W490, 0x0800))) { Or (PRIT, 0x02, PRIT) } Or (PRIT, SETP (PIO0, W530, W640), PRIT) If (And (FLAG, 0x01)) { Or (SYNC, 0x01, SYNC) Store (SDMA (DMA0), SDT0) If (LLess (DMA0, 0x1E)) { Or (ICR3, 0x01, ICR3) } If (LLess (DMA0, 0x3C)) { Or (ICR0, 0x01, ICR0) } If (And (W930, 0x2000)) { Or (ICR1, 0x01, ICR1)
Re: [3/6] 2.6.21-rc3: known regressions
Subject: libata: PATA UDMA/100 configured as UDMA/33 References : http://lkml.org/lkml/2007/2/20/294 http://www.mail-archive.com/linux-ide@vger.kernel.org/msg04115.html http://bugzilla.kernel.org/show_bug.cgi?id=8133 http://bugzilla.kernel.org/show_bug.cgi?id=8164 Submitter : Fabio Comolli [EMAIL PROTECTED] Plamen Petrov [EMAIL PROTECTED] Laurent Riffard [EMAIL PROTECTED] Handled-By : Tejun Heo [EMAIL PROTECTED] Status : patch available Some cases should be fixed now but probably not all (eg the Nvidia one) - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Asus P5B-VM motherboard: cd drive malfunctions if internal nic in use.
On Mon, Mar 12, 2007 at 06:35:10PM -0400, Mark Lord wrote: Is that a PATA cd-drive? If so, then you must have hooked it up to the JMicron IDE controller. That driver is just plain buggy. I gave up on it for my own P5B-VM. The libata version works better than the drivers/ide, but I gave up on it and got a SATA DVD/RW drive. Off topic: do your USB ports power off when the system shuts down? Mine don't -- the +5V continues on them.. I'd love a tip on how to turn them off completely at shutdown. Most Asus boards have jumpers for the USB ports to select between +5V and +5VSB (stand by power). The reason to provide standby power is so that keyboards with power buttons can remain powered so that you can turn the system on using the usb keyboard. If you want to power off the ports entirely, jumper them to the +5V line instead which only has power when the system is on. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] libata: hardreset on SERR_INTERNAL
There was a rare report where SB600 reported SERR_INTERNAL and SRST couldn't get it out of the failure mode. Hardreset on SERR_INTERNAL. As the problem is intermittent, whether this fixes the problem or not hasn't been verified yet, but hardresetting the channel on internal error is a good idea anyway. Signed-off-by: Tejun Heo [EMAIL PROTECTED] diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 7349c3d..fc11bb3 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -1055,7 +1055,7 @@ static void ata_eh_analyze_serror(struct ata_port *ap) } if (serror SERR_INTERNAL) { err_mask |= AC_ERR_SYSTEM; - action |= ATA_EH_SOFTRESET; + action |= ATA_EH_HARDRESET; } if (serror (SERR_PHYRDY_CHG | SERR_DEV_XCHG)) ata_ehi_hotplugged(ehc-i); - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PATA Sil680 Command Timeout on ARM XScale
Hi Folks, We have a command timeout with Sil680 controller on ARM XScale. The kernel is 2.6.18-rc2 and libata 2.00 with preemptive enabled. Similar problem observed as well with kernel preemptive disabled. ATA pass through and sg are used. Heavy IO test was ran on both channels of Sil680 and the system was pretty loaded where the load average was above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Thanks, Fajun User space trace: Cmd 4276 timed out after 7.260137 secs: start time 1173775439.409099 secs, timed out at 1173775446.669236 secs [Tue Mar 13 08:44:06 2007]: Test: Random Write Sectors Extended LBA Low:0 LBA High: 10 ... Num Cmds: 4277 Num_Failed_Cmds:1 ... Status: Fail [Error 401: Command timeout] Dmesg log ~ $ dmesg .77] Calling initcall 0xc001ebb4: inet_diag_init+0x0/0x80() [42949375.77] Calling initcall 0xc001ec34: tcp_diag_init+0x0/0x1c() [42949375.77] Calling initcall 0xc001ec50: bictcp_register+0x0/0x1c() [42949375.77] TCP bic registered [42949375.77] Calling initcall 0xc001ee2c: af_unix_init+0x0/0x80() [42949375.77] NET: Registered protocol family 1 [42949375.77] Calling initcall 0xc001eeac: packet_init+0x0/0x70() [42949375.77] NET: Registered protocol family 17 [42949375.77] Calling initcall 0xc0012a88: clocksource_done_booting+0x0/0x24() [42949375.77] Calling initcall 0xc0019ed4: seqgen_init+0x0/0x1c() [42949375.77] Calling initcall 0xc001ba44: early_uart_console_switch+0x0/0x90() [42949375.77] Calling initcall 0xc013a150: net_random_reseed+0x0/0x38() [42949375.77] RAMDISK: Compressed image found at block 0 [42949378.95] VFS: Mounted root (ext2 filesystem). [42949378.96] Freeing init memory: 104K [42949549.17] ata1: soft resetting port [42949549.25] ata1.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48 [42949549.25] ata1.00: configured for UDMA/100 [42949549.25] ata1: EH complete [42949549.25] Vendor: ATA Model: ST940813AMRev: 3.02 [42949549.25] Type: Direct-Access ANSI SCSI revision: 05 [42949549.26] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) [42949549.26] sda: Write Protect is off [42949549.26] sda: Mode Sense: 00 3a 00 00 [42949549.26] SCSI device sda: drive cache: write back [42949549.27] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) [42949549.27] sda: Write Protect is off [42949549.27] sda: Mode Sense: 00 3a 00 00 [42949549.27] SCSI device sda: drive cache: write back [42949549.27] sda: unknown partition table [42949549.29] sd 0:0:0:0: Attached scsi disk sda [42949549.29] sd 0:0:0:0: Attached scsi generic sg0 type 0 [42949549.32] ata1: soft resetting port [42949549.38] ata1.00: configured for UDMA/100 [42949549.38] ata1: EH complete [42949549.38] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) [42949549.38] sda: Write Protect is off [42949549.38] sda: Mode Sense: 00 3a 00 00 [42949549.39] SCSI device sda: drive cache: write back [42949559.28] ata2: soft resetting port [42949559.42] ata2.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48 [42949559.42] ata2.00: configured for UDMA/100 [42949559.42] ata2: EH complete [42949559.42] Vendor: ATA Model: ST94811A Rev: 3.07 [42949559.42] Type: Direct-Access ANSI SCSI revision: 05 [42949559.43] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB) [42949559.43] sdb: Write Protect is off [42949559.43] sdb: Mode Sense: 00 3a 00 00 [42949559.43] SCSI device sdb: drive cache: write back [42949559.43] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB) [42949559.44] sdb: Write Protect is off [42949559.44] sdb: Mode Sense: 00 3a 00 00 [42949559.44] SCSI device sdb: drive cache: write back [42949559.44] sdb: unknown partition table [42949559.46] sd 1:0:0:0: Attached scsi disk sdb [42949559.46] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 643.23] NWFPE: ntpd[38] takes exception 0001 at c002d514 from 0001d308 [ 711.23] NWFPE: ntpd[38] takes exception 0001 at c002d514 from 0001d308 [ 777.22] NWFPE: ntpd[38] takes exception 0001 at c002d514 from 0001d308 [ 841.30] NWFPE: ntpd[38] takes exception 0001 at c002d514 from
Re: PATA Sil680 Command Timeout on ARM XScale
above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is A lot of drive commands seem to be set up on a seven second worst case always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Alarm() is also handled by the same jiffies logic, so I suspect a bug in your test environment ? - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATA Sil680 Command Timeout on ARM XScale
On 3/13/07, Alan Cox [EMAIL PROTECTED] wrote: above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is A lot of drive commands seem to be set up on a seven second worst case always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Alarm() is also handled by the same jiffies logic, so I suspect a bug in your test environment ? I enabled ata_irq_trap and did the same test again. The kernel timer caught the timeout (10 seconds) this time along with the irq trap traces below. What's the cause of these idle irqs? [42949560.15] SCSI device sdb: drive cache: write back [ 85.57] ata1: irq trap [ 85.82] ata2: irq trap [ 92.12] abnormal status 0xD0 [ 92.12] ata1: irq trap [ 92.92] ata2: irq trap [ 98.75] ata1: irq trap [ 100.26] abnormal status 0xD0 [ 100.26] ata2: irq trap [ 105.54] ata1: irq trap [ 108.05] ata1: irq trap [ 110.62] ata1: irq trap [ 113.13] ata1: irq trap [ 115.53] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 115.53] ata1.00: (BMDMA stat 0x0) [ 115.53] ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout) [ 115.53] ata1: soft resetting port [ 115.57] ata1.00: configured for UDMA/100 [ 115.57] sg_cmd_done: sg0, pack_id=2706, res=0x802, dur=10040 ms [ 115.57] ata1: EH complete [ 115.58] SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) [ 115.58] sda: Write Protect is off [ 115.58] sda: Mode Sense: 00 3a 00 00 [ 115.58] SCSI device sda: drive cache: write back ... Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/12] libata: separate out ata_host_alloc() and ata_host_attach()
Tejun Heo wrote: Association to SCSI host is done via pointer now even for native ATA case, so this should be easier for SAS. What I'm worried about is how EH gets invoked. libata depends on EH to do a lot of things including probing, requesting sense data, etc. How should this work? For SAS, the scsi_host pointer in the ata port is NULL today, since libata is really not managing the scsi host, the LLDD is. I think the initialization model we want for SAS is a little different than the one you are heading towards on SATA. For SAS, I think we just want to be able to alloc/init and delete/destroy a SATA device a they show up on the transport, without tying it to initialization of the ata host. And this set of patches doesn't necessarily prevent that... SAS attached libata port shares EH with the SAS SCSI host, right? How can Right. we connect SAS EH with libata EH and would it be okay for libata EH hold the SCSI EH (thus holding all command execution on the host) to handle ATA exceptions? Currently, ipr calls ata_do_eh from its eh_device_reset_handler function. This seems to work well enough with the testing that I've done, but it would certainly be nice to get to a more layered EH approach, where we could possibly have pluggable error handlers for different device types. Regarding holding all command execution on the host while performing eh, that doesn't seem to be a huge issue from my perspective, not sure if this would have a larger negative impact on others... Generally speaking, we shouldn't be entering eh very often, and it should only be happening if something went wrong. The biggest issue here might be ATAPI devices, since they tend to report more errors during normal running. The request sense for these devices for SAS is done without entering eh today. Would you want to move this into eh as well? Brian -- Brian King eServer Storage I/O IBM Linux Technology Center - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libata Intel PIIX/ICH fails to detect both PATA drives, spews ACPI errors
Tejun Heo wrote: Hello, Berck. Berck E. Nash wrote: Tejun Heo wrote: Berck E. Nash wrote: Testing the new libata ICH PATA drivers. There's one PATA port on this chip, and I've got two optical drives connected to it. The master drive fails to detect. The slave detects and works properly. Can you test 2.6.20.1 and post full dmesg? Here's 2.6.20.2... No ACPI errors, but still doesn't detect both drives. Please apply the attached patch and see if it works. If it works, please post the result of hdparm -I /dev/srX of the optical drive. Thanks. Okay, here ya go: /dev/sr0: ATAPI CD-ROM, with removable media Model Number: LITE-ON LTR-48246S Serial Number: Firmware Revision: SS0E Standards: Used: ATAPI for CD-ROMs, SFF-8020i, r2.5 Supported: CD-ROM ATAPI-2 Configuration: DRQ response: 50us. Packet size: 12 bytes Capabilities: LBA, IORDY(cannot be disabled) DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=227ns IORDY flow control=120ns - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Asus P5B-VM motherboard: cd drive malfunctions if internal nic in use.
On Tue, Mar 13, 2007 at 12:23:06PM -0400, Mark Lord wrote: That's nice. But the P5B-VM board does not have any such jumper for USB, nor does it have any obvious combination of BIOS-setup options to accomplish it. Well it could only be done by hardware. The P5B has those jumpers. I figured the P5B-VM while a budget micro board would still have those. I guess not. Without jumper settings for it there is nothing you can do about it. A quick look through the manual certainly only mentions standby power for the keyboard connector and not for the USB ports. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DVD drive fails in 2.6.20.2
Vlad Codrea wrote: Hi, The DVD-ROM drive on my laptop does not work with the vanilla 2.6.20.2 kernel using drivers/ata. The attached file dmesg.txt contains the full dmesg output including the error messages. I have also attached the .config file I used when compiling the kernel. The DVD device does not appear under /dev (only /dev/sda shows up, which is the hard drive). The ATA-related errors seem to start with: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: (BMDMA stat 0x25) ata2.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x12 data 36 in res 58/00:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation) ata2: soft resetting port ata2: port is slow to respond, please be patient (Status 0xd8) ata2: port failed to respond (30 secs, Status 0xd8) ATA: abnormal status 0xD8 on port 0x177 ATA: abnormal status 0xD8 on port 0x177 I should point out that this DVD drive hasn't worked with drivers/ide either, but it works perfectly under Windows 98. For background on this bug, please see: https://bugzilla.novell.com/show_bug.cgi?id=177050 http://bugzilla.kernel.org/show_bug.cgi?id=6710 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=197477 https://launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/50161 To add more info, the drive is... Model=TORiSAN DVD-ROM DRD-N216, FwRev=1.08, SerialNo=0001 Config={ SpinMotCtl Removeable DTR=5Mbs DTR10Mbs nonMagnetic } RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0 BuffType=unknown, BuffSize=0kB, MaxMultSect=0 (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 *mdma2 AdvancedPM=no and as written above it also doesn't work with the ide drivers. If DMA is turned off using hdparm -d 0, it seems to work better but still doesn't seem to work reliably. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DVD drive fails in 2.6.20.2
Vlad Codrea wrote: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: (BMDMA stat 0x25) ata2.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x12 data 36 in res 58/00:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation) ata2: soft resetting port ata2: port is slow to respond, please be patient (Status 0xd8) ata2: port failed to respond (30 secs, Status 0xd8) ATA: abnormal status 0xD8 on port 0x177 ATA: abnormal status 0xD8 on port 0x177 Okay, now that you're on libata driver, it's easier for me to debug. Can you apply the attached patch over 2.6.20 and report what the kernel says? (the patch will apply with some noise, it's okay) Thanks. -- tejun diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 14629a3..387235f 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4453,9 +4453,13 @@ fsm_start: if (likely(status (ATA_ERR | ATA_DF))) /* device stops HSM for abort/error */ qc-err_mask |= AC_ERR_DEV; - else + else { /* HSM violation. Let EH handle this */ +ata_port_printk(ap, KERN_WARNING, + !DRQ on HSM_ST_FIRST (0x%x)\n, + status); qc-err_mask |= AC_ERR_HSM; + } ap-hsm_task_state = HSM_ST_ERR; goto fsm_start; @@ -4547,13 +4551,17 @@ fsm_start: if (likely(status (ATA_ERR | ATA_DF))) /* device stops HSM for abort/error */ qc-err_mask |= AC_ERR_DEV; -else +else { + ata_port_printk(ap, KERN_WARNING, + !DRQ on HSM_ST (0x%x)\n, + status); /* HSM violation. Let EH handle this. * Phantom devices also trigger this * condition. Mark hint. */ qc-err_mask |= AC_ERR_HSM | AC_ERR_NODEV_HINT; +} ap-hsm_task_state = HSM_ST_ERR; goto fsm_start; @@ -4579,8 +4587,12 @@ fsm_start: status = ata_wait_idle(ap); } -if (status (ATA_BUSY | ATA_DRQ)) +if (status (ATA_BUSY | ATA_DRQ)) { + ata_port_printk(ap, KERN_WARNING, + BUSY|DRQ on ERR|DF (0x%x)\n, + status); qc-err_mask |= AC_ERR_HSM; +} /* ata_pio_sectors() might change the * state to HSM_ST_LAST. so, the state
[PATCH] Add ledtrig_ide_activity () to libata
Hi , all . I noticed that that ledtrig_ide_activity was not enable in libata. I append the patch to correct it. Please apply. regards , Nobuhiro -- Nobuhiro Iwamatsu E-Mail : [EMAIL PROTECTED] GPG ID : 3170EBE9 Signed-off-by: Nobuhiro Iwamatsu [EMAIL PROTECTED] diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index dc362fa..51930be 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -407,6 +407,7 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev, tf-lbah = cyl 8; tf-device |= head; } + ledtrig_ide_activity(); return 0; } diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig index 80acd08..0b99f57 100644 --- a/drivers/leds/Kconfig +++ b/drivers/leds/Kconfig @@ -113,7 +113,7 @@ config LEDS_TRIGGER_TIMER config LEDS_TRIGGER_IDE_DISK bool LED IDE Disk Trigger - depends on LEDS_TRIGGERS BLK_DEV_IDEDISK + depends on LEDS_TRIGGERS ( BLK_DEV_IDEDISK || ATA) help This allows LEDs to be controlled by IDE disk activity. If unsure, say Y. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/12] libata: separate out ata_host_alloc() and ata_host_attach()
Hello, Brian. Brian King wrote: Tejun Heo wrote: Association to SCSI host is done via pointer now even for native ATA case, so this should be easier for SAS. What I'm worried about is how EH gets invoked. libata depends on EH to do a lot of things including probing, requesting sense data, etc. How should this work? For SAS, the scsi_host pointer in the ata port is NULL today, since libata is really not managing the scsi host, the LLDD is. I think the initialization model we want for SAS is a little different than the one you are heading towards on SATA. For SAS, I think we just want to be able to alloc/init and delete/destroy a SATA device a they show up on the transport, without tying it to initialization of the ata host. And this set of patches doesn't necessarily prevent that... Yeap, I tried to keep SAS bridge functions working. If SAS doesn't need the host abstraction and wanna do stuff per-port basis, ata_port_alloc() can be directly exported and separating out per-port register routine shouldn't be too difficult, but I do think it would still be beneficial to have ata_host structure in SAS case too for code simplicity if not for anything else. SAS attached libata port shares EH with the SAS SCSI host, right? How can Right. we connect SAS EH with libata EH and would it be okay for libata EH hold the SCSI EH (thus holding all command execution on the host) to handle ATA exceptions? Currently, ipr calls ata_do_eh from its eh_device_reset_handler function. This seems to work well enough with the testing that I've done, but it would certainly be nice to get to a more layered EH approach, where we could possibly have pluggable error handlers for different device types. That's an unexpected usage of ata_do_eh() but I can see how that works and using ata_do_eh() for that purpose actually makes sense. Most SCSI related dancing is done before and after ata_do_eh() and ata_do_eh() only deals with ATA qc's (except for scsi_eh_finish_cmd() called to finish failed qc's but these are still for only scmds associated with qcs). In the future, we might need to separate those direct scsi_eh_finish_cmd() calls out of ata_do_eh() so that ata_do_eh() really deals with libata qc proper but that change shouldn't be too difficult for SAS. Regarding holding all command execution on the host while performing eh, that doesn't seem to be a huge issue from my perspective, not sure if this would have a larger negative impact on others... Generally speaking, we shouldn't be entering eh very often, and it should only be happening if something went wrong. The biggest issue here might be ATAPI devices, since they tend to report more errors during normal running. The request sense for these devices for SAS is done without entering eh today. Would you want to move this into eh as well? No, not for SAS. The reasons why I put sense requesting to EH were... 1. to make fast path code straight forward (no qc reusing dance) 2. in native ATA, we have per-port EH thread so sharing is not a problem. As #2 is not true in SAS case, I think keeping sense requesting out of EH is the right thing to do here. I still think that it's much simpler/reliable to handle any exception case in a separate thread. I think this in the long term should be solved by making EH per-request queue (we of course will need mechanism to synchronize several EHs so that we can take host-wide EH actions). Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] libata: always use polling SETXFER
Several people have reported LITE-ON LTR-48246S detection failed because SETXFER fails. It seems the device raises IRQ too early after SETXFER. This is controller independent. The same problem has been reported for different controllers. So, now we have pata_via where the controller raises IRQ before it's ready after SETXFER and a device which does similar thing. This patch makes libata always execute SETXFER via polling. As this only happens during EH, performance impact is nil. Setting ATA_TFLAG_POLLING is also moved from issue hot path to ata_dev_set_xfermode() - the only place where SETXFER can be issued. Jeff Garzik suggests that, in the long term, it might be better to modify libata HSM implementation such that we're more tolerant of erratic ATAPI IRQ behavior - e.g. default to IRQ but falling back to polling if the device doesn't seem ready at the point of interrupt. Such change might be necessary to support ancient/weird ATAPI devices. Signed-off-by: Tejun Heo [EMAIL PROTECTED] diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 14629a3..3a8da9d 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3575,10 +3575,13 @@ static unsigned int ata_dev_set_xfermode(struct ata_device *dev) /* set up set-features taskfile */ DPRINTK(set features - xfer mode\n); + /* Some controllers and ATAPI devices show flaky interrupt +* behavior after setting xfer mode. Use polling instead. +*/ ata_tf_init(dev, tf); tf.command = ATA_CMD_SET_FEATURES; tf.feature = SETFEATURES_XFER; - tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; + tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_POLLING; tf.protocol = ATA_PROT_NODATA; tf.nsect = dev-xfer_mode; @@ -5036,14 +5039,6 @@ unsigned int ata_qc_issue_prot(struct ata_queued_cmd *qc) } } - /* Some controllers show flaky interrupt behavior after -* setting xfer mode. Use polling instead. -*/ - if (unlikely(qc-tf.command == ATA_CMD_SET_FEATURES -qc-tf.feature == SETFEATURES_XFER) - (ap-flags ATA_FLAG_SETXFER_POLLING)) - qc-tf.flags |= ATA_TFLAG_POLLING; - /* select the device */ ata_dev_select(ap, qc-dev-devno, 1, 0); diff --git a/drivers/ata/pata_via.c b/drivers/ata/pata_via.c index 96b7179..377e792 100644 --- a/drivers/ata/pata_via.c +++ b/drivers/ata/pata_via.c @@ -426,7 +426,7 @@ static int via_init_one(struct pci_dev *pdev, const struct pci_device_id *id) /* Early VIA without UDMA support */ static struct ata_port_info via_mwdma_info = { .sht = via_sht, - .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING, + .flags = ATA_FLAG_SLAVE_POSS, .pio_mask = 0x1f, .mwdma_mask = 0x07, .port_ops = via_port_ops @@ -434,7 +434,7 @@ static int via_init_one(struct pci_dev *pdev, const struct pci_device_id *id) /* Ditto with IRQ masking required */ static struct ata_port_info via_mwdma_info_borked = { .sht = via_sht, - .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING, + .flags = ATA_FLAG_SLAVE_POSS, .pio_mask = 0x1f, .mwdma_mask = 0x07, .port_ops = via_port_ops_noirq, @@ -442,7 +442,7 @@ static int via_init_one(struct pci_dev *pdev, const struct pci_device_id *id) /* VIA UDMA 33 devices (and borked 66) */ static struct ata_port_info via_udma33_info = { .sht = via_sht, - .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING, + .flags = ATA_FLAG_SLAVE_POSS, .pio_mask = 0x1f, .mwdma_mask = 0x07, .udma_mask = 0x7, @@ -451,7 +451,7 @@ static int via_init_one(struct pci_dev *pdev, const struct pci_device_id *id) /* VIA UDMA 66 devices */ static struct ata_port_info via_udma66_info = { .sht = via_sht, - .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING, + .flags = ATA_FLAG_SLAVE_POSS, .pio_mask = 0x1f, .mwdma_mask = 0x07, .udma_mask = 0x1f, @@ -460,7 +460,7 @@ static int via_init_one(struct pci_dev *pdev, const struct pci_device_id *id) /* VIA UDMA 100 devices */ static struct ata_port_info via_udma100_info = { .sht = via_sht, - .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING, + .flags = ATA_FLAG_SLAVE_POSS, .pio_mask = 0x1f, .mwdma_mask = 0x07, .udma_mask = 0x3f, @@ -469,7 +469,7 @@ static int via_init_one(struct pci_dev *pdev, const struct pci_device_id *id) /* UDMA133 with bad AST (All current 133) */ static struct ata_port_info
Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)
On 12/03/07, Tejun Heo [EMAIL PROTECTED] wrote: Stephen Hemminger wrote: On Tue, 13 Mar 2007 04:03:00 +0900 Tejun Heo [EMAIL PROTECTED] wrote: Stephen Hemminger wrote: 1. the controller has IRQ stuck high (infrequent but possible) 2. the IRQ is already requested by another device 3. the IRQ gets disabled due to screaming interrupts at the moment ata_piix does pci_enable_device(). I think we can be much more resilient to screaming interrupts if we enable device with IRQ disabled and enable it after the device is initialized to some level, possibly when requesting IRQ. The first thing the skge driver does is do a chip reset, and that should cause IRQ to be disabled and cleared. The driver has no chance to fix it if the BIOS left the IRQ screaming... What if we do something like... pci_intx(pdev, 0); pci_enable_device(pdev); /* initialize */ request_irq(blah blah...); pci_intx(pdev, 1); Would this work for skge? Okay for testing, but any change like this should be done in the base PCI layer, not one off in a particular driver. Yeap, it was a proof-of-concept pseudo code. I attached a patch to do above in skge. Please point out if it is broken (e.g. intx needs to be enabled earlier). Michal, can you apply the attached patch and see whether it fixes the problem. I think that problem is solved. Thanks. Thanks. -- tejun diff --git a/drivers/net/skge.c b/drivers/net/skge.c index eea75a4..2c990f2 100644 --- a/drivers/net/skge.c +++ b/drivers/net/skge.c @@ -3585,6 +3585,7 @@ static int __devinit skge_probe(struct pci_dev *pdev, struct skge_hw *hw; int err, using_dac = 0; + pci_intx(pdev, 0); err = pci_enable_device(pdev); if (err) { dev_err(pdev-dev, cannot enable PCI device\n); @@ -3669,6 +3670,7 @@ static int __devinit skge_probe(struct pci_dev *pdev, dev-name, pdev-irq); goto err_out_unregister; } + pci_intx(pdev, 1); skge_show_addr(dev); if (hw-ports 1 (dev1 = skge_devinit(hw, 1, using_dac))) { Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (PL) (http://www.stardust.webpages.pl/ltg/) LTG - Linux Testers Group (EN) (http://www.stardust.webpages.pl/linux_testers_group_en/) - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [5/6] 2.6.21-rc3: known regressions
Hi! This email lists some known regressions in Linus' tree compared to 2.6.20. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject: resume: slab error in verify_redzone_free(): cache `size-512': memory outside object was overwritten References : http://lkml.org/lkml/2007/2/24/41 Submitter : Pavel Machek [EMAIL PROTECTED] Status : unknown Subject: beeps get longer after suspend References : http://lkml.org/lkml/2007/2/26/276 Submitter : Pavel Machek [EMAIL PROTECTED] Status : unknown Seems fixed in -rc3. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/6] 2.6.21-rc3: known regressions
On 3/13/07, Alan Cox [EMAIL PROTECTED] wrote: Subject: libata: PATA UDMA/100 configured as UDMA/33 References : http://lkml.org/lkml/2007/2/20/294 http://www.mail-archive.com/linux-ide@vger.kernel.org/msg04115.html http://bugzilla.kernel.org/show_bug.cgi?id=8133 http://bugzilla.kernel.org/show_bug.cgi?id=8164 Submitter : Fabio Comolli [EMAIL PROTECTED] Plamen Petrov [EMAIL PROTECTED] Laurent Riffard [EMAIL PROTECTED] Handled-By : Tejun Heo [EMAIL PROTECTED] Status : patch available Some cases should be fixed now but probably not all (eg the Nvidia one) This regression is still present in 2.6.21-rc3-g8b9909de (pulled from Linus' tree less than one hour ago). Fabio - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/6] 2.6.21-rc2: known regressions
Hello, Mathieu Bérard wrote: [ 15.031823] ata1.00: taskfile_load_raw: (0x1f1-1f7): hex: 10 03 00 00 00 a0 ef Okay, this is interesting. This is Enable Device-Initiated Interface Power State Transitions. So, after this command is executed the device will try to transit to partial/slumber SATA PHY power states at its discretion, which is all cool and dandy in theory but depending on controller and drive firmware can cause all sorts of problems. The NCQ problem you're seeing probably is some side effect of device initiated link PS. Can't tell whether the controller or the drive's firmware is problem without further info. Due to blacklisting, NCQ won't be turned on your drive in future kernels and link PS doesn't seem to cause any problem no non-NCQ, so your case is taken care of here but this leaves me a bit worried about what _GTF feeds us. I don't think we can reliably filter out command TFs as it might even contain vendor-specific commands but it might be better to always log TFs executed for _GTF such that we at least know what's going on with the drive. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html