date:20070313

Re: sata_promise SATA300TX4 intermittent problems

2007-03-13 Thread Tomi Orava


Hello,

 Peter Favrholdt wrote:
 My feeling is this is not caused by 1.5Gbps or 3.0Gbps operation.
 ...snip
 My next test will be a plain 2.6.21rc2. Then I'll apply the patches one
 by one.

 I've tested 2.6.21-rc2 which fails (sdc down after 27 minutes  sdd down
 after 46 minutes).

 Then I applied just a single patch to 2.6.21-rc2: Mikael Petterssons
 patch to force 1.5Gbps operation and tested again - this time no
 problems at all!

 (BTW: both kernels are running with IO-APIC disabled).

 I've put results+dmesg output here: http://sata300tx4.gratiswiki.dk/

I had the oppoturnity to test Mikael's 1,5Gbps patch yesterday evening and
although the system seems to run OK, I still do get the following system
log messages:


Mar 13 06:10:22 alderan kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x0
Mar 13 06:10:22 alderan kernel: ata2.00: cmd
c8/00:30:0f:e2:86/00:00:00:00:00/e7 tag 0 cdb 0x0 data 24576 in
Mar 13 06:10:22 alderan kernel:  res
50/00:00:3e:e2:86/00:00:00:00:00/e7 Emask 0x1 (device error)
Mar 13 06:10:22 alderan kernel: ata2.00: configured for UDMA/133
Mar 13 06:10:22 alderan kernel: ata2: EH complete
Mar 13 06:10:22 alderan kernel: SCSI device sdb: 976773168 512-byte hdwr
sectors (500108 MB)
Mar 13 06:10:22 alderan kernel: sdb: Write Protect is off
Mar 13 06:10:22 alderan kernel: SCSI device sdb: write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Mar 13 06:11:23 alderan kernel: possible SYN flooding on port 52223.
Sending cookies.
Mar 13 06:13:05 alderan kernel: possible SYN flooding on port 52223.
Sending cookies.
Mar 13 06:13:23 alderan kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x0
Mar 13 06:13:23 alderan kernel: ata2.00: cmd
25/00:00:27:29:73/00:02:07:00:00/e0 tag 0 cdb 0x0 data 262144 in
Mar 13 06:13:23 alderan kernel:  res
50/00:00:26:2b:73/00:00:00:00:00/e0 Emask 0x1 (device error)
Mar 13 06:13:23 alderan kernel: ata2.00: configured for UDMA/133
Mar 13 06:13:23 alderan kernel: ata2: EH complete
Mar 13 06:13:23 alderan kernel: SCSI device sdb: 976773168 512-byte hdwr
sectors (500108 MB)
Mar 13 06:13:23 alderan kernel: sdb: Write Protect is off
Mar 13 06:13:23 alderan kernel: SCSI device sdb: write cache: enabled,
read cache: enabled, doesn't support DPO or FUA

If Mikael does have an updated patch for the more detailed error reporting
features, I'll try to run it in a few days of time whenever I get my hands
on it. I would be really interested to know why the Promise Sata300TX4
doesn't play along the newer 500GB Seagate 7200.10 disks while the older
models are Ok (I've already tried with and without 1,5Gbps jumpers and
patches).

Regards,
Tomi Orava


-- 


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] libata: fix native mode disabled port handling

2007-03-13 Thread Tejun Heo

Jeff Garzik wrote:
 Tejun Heo wrote:
 Disabled port handling in ata_pci_init_native_mode() is slightly
 broken in that it may end up using the wrong port_info.  This patch
 updates it such that disables ports are made dummy as done in the
 legacy and other cases.

 While at it, fix indentation in ata_resources_present().

 Signed-off-by: Tejun Heo [EMAIL PROTECTED]
 ---
  drivers/ata/libata-sff.c |   62
 ++
  1 files changed, 35 insertions(+), 27 deletions(-)
 
 what's the extent of the breakage here?
 
 I would rather push this into #upstream

Yeap, this is for #upstream.  The breakage is theoretical as none hits
the path yet.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] libata: add support for READ/WRITE LONG

2007-03-13 Thread Ric Wheeler




Tejun Heo wrote:

Mark Lord wrote:
  

The READ/WRITE LONG commands are theoretically obsolete,
but the majority of drives in existance still implement them.

The WRITE_LONG and WRITE_LONG_ONCE commands are of particular
interest for fault injection testing -- eg. creating media errors
at specific locations on a disk.

The fussy bit is that these commands require a non-standard
sector size, usually 520 bytes instead of 512.

This patch adds support to libata for READ/WRITE LONG commands
issued via SG_IO/ATA_16.

This patch was generated against a 2.6.21-rc3-git7 base:



I think it would be better if this comes in two patches.  One to add
qc-sect_size and convert all users of ATA_SECT_SIZE to qc-sect_size
and the other one to implement READ/WRITE LONG.  Another question is
whether this needs to be included into mainline.  This is definitely
useful but it is mostly for debugging/testing.

Hmmm... But we're gonna need qc-sect_size anyway for devices with
larger sector sizes and overhead for supporting READ/WRITE LONG is
nearly nill, so I'm voting for inclusion.

Thanks.
  
I just want to add that this patch has been incredibly useful for us in 
testing the error handling  RAID. Nothing like real media errors on 
demand to validate your assumptions ;-)


ric

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: regression: ide-floppy doesn't work with IOMEGA IDE ZIP drive

2007-03-13 Thread Bartlomiej Zolnierkiewicz

Hi,

On Monday 12 March 2007, Sergei Shtylyov wrote:
Hello.

Tejun Heo wrote:

Stanislav Brabec reported that IOMEGA IDE ZIP drive doesn't work with
recent kernels. Low level driver is via82cxxx. Relevant part of
2.6.20.1 boot message follows.

VP_IDE: IDE controller at PCI slot :00:11.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt8233 (rev 00) IDE UDMA100 controller on pci:00:11.1
ide0: BM-DMA at 0xff00-0xff07, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xff08-0xff0f, BIOS settings: hdc:DMA, hdd:DMA
Probing IDE interface ide0...
hda: ST3160812A, ATA DISK drive
hdb: IOMEGA ZIP 100 ATAPI, ATAPI FLOPPY drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
...
hdb: lost interrupt
hdb: status error: status=0x00 { }
ide: failed opcode was: unknown
ide-floppy: Strange, packet command initiated yet DRQ isn't asserted
...
hdb: 98304kB, 96/64/32 CHS, 4096 kBps, 512 sector size, 2941 rpm
hdb: No disk in drive
hdb: lost interrupt
hdb: status error: status=0x00 { }
ide: failed opcode was: unknown
ide-floppy: Strange, packet command initiated yet DRQ isn't asserted
[above repeats several times]
...
hdb: lost interrupt
hdb: status error: status=0x00 { }
ide: failed opcode was: unknown
ide-floppy: Strange, packet command initiated yet DRQ isn't asserted
hdb: 98304kB, 196608 blocks, 512 sector size
hdb: unknown partition table
hdb: unknown partition table
hdb: unknown partition table
hdb: unknown partition table

And the device is inaccessible after boot completed. On suse 10.1
kernel (2.6.16 based), it works better.

VP_IDE: IDE controller at PCI slot :00:11.1
PCI: VIA IRQ fixup for :00:11.1, from 255 to 0
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt8233 (rev 00) IDE UDMA100 controller on pci:00:11.1
ide0: BM-DMA at 0xff00-0xff07, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xff08-0xff0f, BIOS settings: hdc:DMA, hdd:DMA
Probing IDE interface ide0...
hda: ST3160812A, ATA DISK drive
hdb: IOMEGA ZIP 100 ATAPI, ATAPI FLOPPY drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
...
hdb: No disk in drive
hdb: 98304kB, 96/64/32 CHS, 4096 kBps, 512 sector size, 2941 rpm
...
hdb: lost interrupt
hdb: status error: status=0x00 { }
ide: failed opcode was: unknown
ide-floppy: Strange, packet command initiated yet DRQ isn't asserted
hdb: 98304kB, 196608 blocks, 512 sector size
hdb: unknown partition table

There is one lost interrupt message but the drive reportedly works
fine after that. Stanislav also seems to recall that ide-floppy
worked without any error message with older kernel.

I'm attaching full boot log messages for 2.6.20.1 and suse 10.1.

Any ideas?

BTW... I've looked at that code last spring and found it strange that
ide-floopy is the only driver that still calls dma_start() method *before*
issuing a command *while this is not a right thing to do accoring to spec and
is known to not work with some chips, namely Promise). I was going to send a
patch then but lacking both time and actual hardware, kept deferring it
since... :-)

We are probably hitting two bugs here:

* regression between 2.6.16-2.6.20

* the issue that Sergei described

Stanislav, could you use git bisect to narrow down the problem
to the specific patch?

Good practical example of using git-bisect is here:
http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/

Thanks,
Bart
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: regression: ide-floppy doesn't work with IOMEGA IDE ZIP drive

2007-03-13 Thread Stanislav Brabec

Jeff Garzik wrote:
 Tejun Heo wrote:
  [libata]
  And, as the device requires custom high level driver, libata fails
  miserably.  Would it be worth to try support these devices?  Or are
  they just too outdated to put the effort in?
 
 What SCSI peripheral device type does it report, when booted under libata?

Internal IOMEGA ZIP 100 IDE (manufactured by NEC).

ata1.01: ATAPI, max PIO2, CDB intr
ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x12 data 36 in
 res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata1: soft resetting port
ata1.01: configured for PIO2
ata1: EH complete
ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x12 data 36 in
 res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata1.01: configured for PIO2
ata1: EH complete
ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x12 data 36 in
 res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata1: soft resetting port
... and so on

For more see
https://bugzilla.novell.com/show_bug.cgi?id=232086
(complete ide-floppy and libata logs are there)

-- 
Best Regards,

Stanislav Brabec
software developer
-
SUSE LINUX, s. r. o.  e-mail: [EMAIL PROTECTED]
Lihovarska 1060/12tel: +420 284 028 966
190 00 Praha 9fax: +420 284 028 951
Czech Republichttp://www.suse.cz/

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timeout error, disk gone

2007-03-13 Thread Alan Cox

On Tue, 13 Mar 2007 08:31:55 + (UTC)
Matthias Urlichs [EMAIL PROTECTED] wrote:

 Transient glitch? Major ugliness? For the time being I have not re-added
 the thing to my RAID, but the three other disks in it are the exact same
 model...

What model and what firmware ? There are some problem firmware releases
around with older SATA drives
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: regression: ide-floppy doesn't work with IOMEGA IDE ZIP drive

2007-03-13 Thread Alan Cox

 It seems ide-floppy needs some special handlings in interrupt handling
 too like delaying data transfer by several ticks after device indicates
 readiness.  Apart from separate high level driver, we might have to
 modify libata HSM implementation if we're gonna support these devices.

The data transfer delay may well be down to the DMA bug

 Can someone more knowledgeable explain what needs to be done differently
 from standard ATAPI for these devices?

In theory not a lot if anything. I don't have a ZIP drive but have got an
old Iomega Clik! PCMCIA drive somewhere if you need an ide-floppy device
Tejun.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timeout error, disk gone

2007-03-13 Thread smurf

Hi,

Alan Cox:
 What model and what firmware ? There are some problem firmware releases
 around with older SATA drives

Samsung SP2004C, firmware version unknown -- how do I find out?

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  [EMAIL PROTECTED]
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
It is all right to hold a conversation, but you should let go of it now
and then.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timeout error, disk gone

2007-03-13 Thread Alan Cox

On Tue, 13 Mar 2007 14:21:15 +0100
[EMAIL PROTECTED] wrote:

 Hi,
 
 Alan Cox:
  What model and what firmware ? There are some problem firmware releases
  around with older SATA drives
 
 Samsung SP2004C, firmware version unknown -- how do I find out?

Its in the identify data, however I'm not aware of any problem Samsung
drives in reports so far.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timeout error, disk gone

2007-03-13 Thread Tejun Heo

[EMAIL PROTECTED] wrote:
 Hi,
 
 Alan Cox:
 What model and what firmware ? There are some problem firmware releases
 around with older SATA drives
 
 Samsung SP2004C, firmware version unknown -- how do I find out?

Please post the result of 'lspci -nn' and full boot log.  Samsung
firmwares tend to be pretty good but I've seen earlier ones occasionally
lock up after certain PHY events.  Removing power and reapplying puts it
back into sane state.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timeout error, disk gone

2007-03-13 Thread Tejun Heo

Tejun Heo wrote:
 [EMAIL PROTECTED] wrote:
 Hi,

 Alan Cox:
 What model and what firmware ? There are some problem firmware releases
 around with older SATA drives
 Samsung SP2004C, firmware version unknown -- how do I find out?
 
 Please post the result of 'lspci -nn' and full boot log.  Samsung
 firmwares tend to be pretty good but I've seen earlier ones occasionally
 lock up after certain PHY events.  Removing power and reapplying puts it
 back into sane state.

Oh and I'm pretty sure it was the drive which locked up.  If you connect
the harddrive to different working port without removing power, the same
timeouts happen while the original port detects and works fine with
another hotplugged drive.  And for the record, among the earlier SATA
drives, samsung ones were definitely in the better group.

Also, please include full log of failure.  Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: regression: ide-floppy doesn't work with IOMEGA IDE ZIP drive

2007-03-13 Thread Tejun Heo

Alan Cox wrote:
 It seems ide-floppy needs some special handlings in interrupt handling
 too like delaying data transfer by several ticks after device indicates
 readiness.  Apart from separate high level driver, we might have to
 modify libata HSM implementation if we're gonna support these devices.
 
 The data transfer delay may well be down to the DMA bug

I see.

 Can someone more knowledgeable explain what needs to be done differently
 from standard ATAPI for these devices?
 
 In theory not a lot if anything. I don't have a ZIP drive but have got an
 old Iomega Clik! PCMCIA drive somewhere if you need an ide-floppy device
 Tejun.

If you've got a spare one, that would be great but otherwise I think
you're much better qualified for libata ide-floppy support. :-)

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timeout error, disk gone

2007-03-13 Thread smurf

Hi,

Alan Cox:
  Samsung SP2004C, firmware version unknown -- how do I find out?
 
 Its in the identify data, however I'm not aware of any problem Samsung
 drives in reports so far.

Looks like a hardware problem. After a hard powerdown got it
recognizable again, it now manages to have a read speed of 1/2 MB/sec --
as opposed to its three brethren which are approx. 111 times faster.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  [EMAIL PROTECTED]
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
Brisk talkers are usually slow thinkers.  There is, indeed, no wild beast
more to be dreaded than a communicative man having nothing to communicate.
If you are civil to the voluble, they will abuse your patience; if
brusque, your character.
-- Jonathan Swift
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timeout error, disk gone

2007-03-13 Thread smurf

Hi,

Tejun Heo:
  Samsung SP2004C, firmware version unknown -- how do I find out?
 
 Please post the result of 'lspci -nn' and full boot log.  Samsung
 firmwares tend to be pretty good but I've seen earlier ones occasionally
 lock up after certain PHY events.  Removing power and reapplying puts it
 back into sane state.
 
Not this one, as reported in my earlier mail -- unless you redefine sane.

I'll try to elucidate something from it after it's replaced.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  [EMAIL PROTECTED]
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
When the polls are overwhelmingly unfavorable, (a) ridicule and dismiss them
or (b) stress the volatility of public opinion.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Fwd: libata extension

2007-03-13 Thread Jeff Garzik


Vitaliyi wrote:

Why is the access to Control register needed?


To execute soft reset for example.


 In the perfect case i would like to be able to execute vendor command
 set (reverse engineered).

Sounds interesting. :-)

Could you give some more details on what are you going to implement?


Reading/writing service area, uploading, downloading modules, working
with flash etc.


SAT (aka ATA passthru) defines how to do soft-reset.

SG_IO supports the ATA_12 and ATA_16 commands which permit soft-reset 
and similar tasks.  libata supports this interface, but does not yet 
support soft-reset and similar non-comment-oriented tasks.  This would 
be the best area to add such features, though.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3/6] 2.6.21-rc2: known regressions

2007-03-13 Thread Tejun Heo

Can you apply the attached patch and report what the kernel says with
ACPI turned on?

-- 
tejun
diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index 019d8ff..6a27a7f 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -473,8 +473,8 @@ static void taskfile_load_raw(struct ata_port *ap,
struct ata_taskfile tf;
unsigned int err;
 
-   if (ata_msg_probe(ap))
-   ata_dev_printk(atadev, KERN_DEBUG, %s: (0x1f1-1f7): hex: 
+   if (1 || ata_msg_probe(ap))
+   ata_dev_printk(atadev, KERN_INFO, %s: (0x1f1-1f7): hex: 
%02x %02x %02x %02x %02x %02x %02x\n,
__FUNCTION__,
gtf-tfa[0], gtf-tfa[1], gtf-tfa[2],

[5/6] 2.6.21-rc3: known regressions

2007-03-13 Thread Adrian Bunk

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject: resume: slab error in verify_redzone_free(): cache `size-512':
 memory outside object was overwritten
References : http://lkml.org/lkml/2007/2/24/41
Submitter  : Pavel Machek [EMAIL PROTECTED]
Status : unknown


Subject: beeps get longer after suspend
References : http://lkml.org/lkml/2007/2/26/276
Submitter  : Pavel Machek [EMAIL PROTECTED]
Status : unknown


Subject: suspend/resume hangs until keypress
References : http://bugzilla.kernel.org/show_bug.cgi?id=8181
Submitter  : Tomas Janousek [EMAIL PROTECTED]
Status : unknown


Subject: SATA breakage on resume
References : http://lkml.org/lkml/2007/3/7/233
Submitter  : Thomas Gleixner [EMAIL PROTECTED]
 Soeren Sonnenburg [EMAIL PROTECTED]
Status : unknown


Subject: first disk access after resume takes several minutes
References : http://lkml.org/lkml/2007/3/8/117
Submitter  : Michael S. Tsirkin [EMAIL PROTECTED]
Status : unknown


Subject: after resume: X hangs after drawing a couple of windows
References : http://lkml.org/lkml/2007/3/8/117
Submitter  : Michael S. Tsirkin [EMAIL PROTECTED]
Status : unknown


Subject: ThinkPad Z60m: usb mouse stops working after suspend to ram
References : http://lkml.org/lkml/2007/2/21/413
 http://lkml.org/lkml/2007/2/28/172
Submitter  : Arkadiusz Miskiewicz [EMAIL PROTECTED]
Caused-By  : Konstantin Karasyov [EMAIL PROTECTED]
 commit 0a6139027f3986162233adc17285151e78b39cac
Handled-By : Konstantin Karasyov [EMAIL PROTECTED]
Status : problem is being debugged


Subject: suspend to disk breaks ACPI
References : http://lkml.org/lkml/2007/3/5/127
Submitter  : Lukas Hejtmanek [EMAIL PROTECTED]
Status : unknown



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[3/6] 2.6.21-rc3: known regressions

2007-03-13 Thread Adrian Bunk

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject: AMD Elan: Crash after Allocating PCI resources
References : http://bugzilla.kernel.org/show_bug.cgi?id=8161
Submitter  : Vladimir Brik [EMAIL PROTECTED]
Handled-By : Andi Kleen [EMAIL PROTECTED]
Status : problem is being debugged


Subject: x86_64: boot hangs unless CONFIG_PCIEPORTBUS=n and acpi=off
References : http://bugzilla.kernel.org/show_bug.cgi?id=8162
Submitter  : Randy Dunlap [EMAIL PROTECTED]
Status : unknown


Subject: ACPI regression with noapic
References : http://lkml.org/lkml/2007/3/8/468
Submitter  : Ray Lee [EMAIL PROTECTED]
Status : unknown


Subject: acpi_serialize locks system during boot
References : http://bugzilla.kernel.org/show_bug.cgi?id=8171
Submitter  : Colchao [EMAIL PROTECTED]
Status : unknown


Subject: NCQ problem with ahci and Hitachi drive  (ACPI related)
References : http://lkml.org/lkml/2007/3/4/178
 http://lkml.org/lkml/2007/3/9/475
Submitter  : Mathieu Bérard [EMAIL PROTECTED]
Handled-By : Tejun Heo [EMAIL PROTECTED]
Status : unknown


Subject: kernels fail to boot with drives on ATIIXP controller
 (ACPI/IRQ related)
References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229621
 http://lkml.org/lkml/2007/3/4/257
Submitter  : Michal Jaegermann [EMAIL PROTECTED]
Status : unknown


Subject: libata: PATA UDMA/100 configured as UDMA/33
References : http://lkml.org/lkml/2007/2/20/294
 http://www.mail-archive.com/linux-ide@vger.kernel.org/msg04115.html
 http://bugzilla.kernel.org/show_bug.cgi?id=8133
 http://bugzilla.kernel.org/show_bug.cgi?id=8164
Submitter  : Fabio Comolli [EMAIL PROTECTED]
 Plamen Petrov [EMAIL PROTECTED]
 Laurent Riffard [EMAIL PROTECTED]
Handled-By : Tejun Heo [EMAIL PROTECTED]
Status : patch available



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3/6] 2.6.21-rc2: known regressions

2007-03-13 Thread Mathieu Bérard

Tejun Heo a écrit :
  Mathieu Bérard wrote:
  Jeff Garzik a écrit :
  Adrian Bunk wrote:
  Subject: NCQ problem with ahci and Hitachi drive
  References : http://lkml.org/lkml/2007/3/4/178
  Submitter  : Mathieu Bérard [EMAIL PROTECTED]
  Status : unknown
  according to the last message in that thread, it sounds like
ACPI and
  interrupt problems
 
  Hi,
  after more testing with a 2.6.21-rc3, it appears that after
several ata
  errors the boot process
  somehow continued as normal, after a NCQ disabled due to excessive
  errors message.
  pci=noacpi or noacpi parameters workarounds the problem irqpoll
  does nothing.
 
  I was mistaken.  It can't be IRQ routing problem.  I somehow thought the
  port was a ata_piix one.  Considering the reported broken NCQ feature on
  the device GTF might be mangling with the drive to disable NCQ or
  something.  Does giving libata.noacpi=1 make any difference?
 

Hi,
libata.noacpi=1 worked. The drive is up and running with NCQ on.
Here is the PATA/SATA related part of my DSDT table with the _GTF methods:

Device (PATA)
{
Name (_ADR, 0x001F0001)
OperationRegion (PACS, PCI_Config, 0x40, 0xC0)
Field (PACS, DWordAcc, NoLock, Preserve)
{
PRIT,   16,
Offset (0x04),
PSIT,   4,
Offset (0x08),
SYNC,   4,
Offset (0x0A),
SDT0,   2,
,   2,
SDT1,   2,
Offset (0x14),
ICR0,   4,
ICR1,   4,
ICR2,   4,
ICR3,   4,
ICR4,   4,
ICR5,   4
}

Device (PRID)
{
Name (_ADR, 0x00)
Method (_GTM, 0, NotSerialized)
{
Name (PBUF, Buffer (0x14)
{
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00
})
CreateDWordField (PBUF, 0x00, PIO0)
CreateDWordField (PBUF, 0x04, DMA0)
CreateDWordField (PBUF, 0x08, PIO1)
CreateDWordField (PBUF, 0x0C, DMA1)
CreateDWordField (PBUF, 0x10, FLAG)
Store (GETP (PRIT), PIO0)
Store (GDMA (And (SYNC, 0x01), And (ICR3, 0x01), And (ICR0,
0x01), SDT0, And (ICR1, 0x01)), DMA0)
If (LEqual (DMA0, 0x))
{
Store (PIO0, DMA0)
}

If (And (PRIT, 0x4000))
{
If (LEqual (And (PRIT, 0x90), 0x80))
{
Store (0x0384, PIO1)
}
Else
{
Store (GETT (PSIT), PIO1)
}
}
Else
{
Store (0x, PIO1)
}

Store (GDMA (And (SYNC, 0x02), And (ICR3, 0x02), And (ICR0,
0x02), SDT1, And (ICR1, 0x02)), DMA1)
If (LEqual (DMA1, 0x))
{
Store (PIO1, DMA1)
}

Store (GETF (And (SYNC, 0x01), And (SYNC, 0x02), PRIT), FLAG)
If (And (LEqual (PIO0, 0x), LEqual (DMA0, 0x)))
{
Store (0x78, PIO0)
Store (0x14, DMA0)
Store (0x03, FLAG)
}

Return (PBUF)
}

Method (_STM, 3, NotSerialized)
{
CreateDWordField (Arg0, 0x00, PIO0)
CreateDWordField (Arg0, 0x04, DMA0)
CreateDWordField (Arg0, 0x08, PIO1)
CreateDWordField (Arg0, 0x0C, DMA1)
CreateDWordField (Arg0, 0x10, FLAG)
If (LEqual (SizeOf (Arg1), 0x0200))
{
And (PRIT, 0x40F0, PRIT)
And (SYNC, 0x02, SYNC)
Store (0x00, SDT0)
And (ICR0, 0x02, ICR0)
And (ICR1, 0x02, ICR1)
And (ICR3, 0x02, ICR3)
And (ICR5, 0x02, ICR5)
CreateWordField (Arg1, 0x62, W490)
CreateWordField (Arg1, 0x6A, W530)
CreateWordField (Arg1, 0x7E, W630)
CreateWordField (Arg1, 0x80, W640)
CreateWordField (Arg1, 0xB0, W880)
CreateWordField (Arg1, 0xBA, W930)
Or (PRIT, 0x8004, PRIT)
If (LAnd (And (FLAG, 0x02), And (W490, 0x0800)))
{
Or (PRIT, 0x02, PRIT)
}

Or (PRIT, SETP (PIO0, W530, W640), PRIT)
If (And (FLAG, 0x01))
{
Or (SYNC, 0x01, SYNC)
Store (SDMA (DMA0), SDT0)
If (LLess (DMA0, 0x1E))
{
Or (ICR3, 0x01, ICR3)
}

If (LLess (DMA0, 0x3C))
{
Or (ICR0, 0x01, ICR0)
}

If (And (W930, 0x2000))
{
Or (ICR1, 0x01, ICR1)

Re: [3/6] 2.6.21-rc3: known regressions

2007-03-13 Thread Alan Cox

 Subject: libata: PATA UDMA/100 configured as UDMA/33
 References : http://lkml.org/lkml/2007/2/20/294
  
 http://www.mail-archive.com/linux-ide@vger.kernel.org/msg04115.html
  http://bugzilla.kernel.org/show_bug.cgi?id=8133
  http://bugzilla.kernel.org/show_bug.cgi?id=8164
 Submitter  : Fabio Comolli [EMAIL PROTECTED]
  Plamen Petrov [EMAIL PROTECTED]
  Laurent Riffard [EMAIL PROTECTED]
 Handled-By : Tejun Heo [EMAIL PROTECTED]
 Status : patch available

Some cases should be fixed now but probably not all (eg the Nvidia one)
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Asus P5B-VM motherboard: cd drive malfunctions if internal nic in use.

2007-03-13 Thread Lennart Sorensen

On Mon, Mar 12, 2007 at 06:35:10PM -0400, Mark Lord wrote:
 Is that a PATA cd-drive?  If so, then you must have hooked it up
 to the JMicron IDE controller.  That driver is just plain buggy.
 
 I gave up on it for my own P5B-VM.  The libata version works better
 than the drivers/ide, but I gave up on it and got a SATA DVD/RW drive.
 
 Off topic:  do your USB ports power off when the system shuts down?
 Mine don't -- the +5V continues on them.. I'd love a tip on how to
 turn them off completely at shutdown.

Most Asus boards have jumpers for the USB ports to select between +5V
and +5VSB (stand by power).  The reason to provide standby power is so
that keyboards with power buttons can remain powered so that you can
turn the system on using the usb keyboard.  If you want to power off the
ports entirely, jumper them to the +5V line instead which only has power
when the system is on.

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] libata: hardreset on SERR_INTERNAL

2007-03-13 Thread Tejun Heo

There was a rare report where SB600 reported SERR_INTERNAL and SRST
couldn't get it out of the failure mode.  Hardreset on SERR_INTERNAL.
As the problem is intermittent, whether this fixes the problem or not
hasn't been verified yet, but hardresetting the channel on internal
error is a good idea anyway.

Signed-off-by: Tejun Heo [EMAIL PROTECTED]

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 7349c3d..fc11bb3 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1055,7 +1055,7 @@ static void ata_eh_analyze_serror(struct ata_port *ap)
}
if (serror  SERR_INTERNAL) {
err_mask |= AC_ERR_SYSTEM;
-   action |= ATA_EH_SOFTRESET;
+   action |= ATA_EH_HARDRESET;
}
if (serror  (SERR_PHYRDY_CHG | SERR_DEV_XCHG))
ata_ehi_hotplugged(ehc-i);
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

PATA Sil680 Command Timeout on ARM XScale

2007-03-13 Thread Fajun Chen


Hi Folks,

We have a command timeout with Sil680 controller on ARM XScale. The
kernel is 2.6.18-rc2 and libata 2.00 with preemptive enabled. Similar
problem observed as well with kernel preemptive disabled.  ATA pass
through and sg are used.  Heavy IO test was ran on both channels of
Sil680 and the system was pretty loaded where the load average was
above 1.5.   Two timers are used to track command timeout in our test
software.  The one in the user space is set to 6 seconds using alarm()
call while the one in the kernel (scsi timer) is set to 5 seconds.
These timeout values are probably too low to be realistic,  but the
issue here is not about the timeout itself but to understand why it is
always user space timer expired before kernel timer.   Since kernel
timer uses jiffies to track time, does this imply a kernel bug where
the time interrupts were lost or delay somehow?  Do you know any know
problems related to command timeout in PATA Sil680?

Thanks,
Fajun

User space trace:
Cmd 4276 timed out after 7.260137 secs: start time 1173775439.409099
secs, timed out at 1173775446.669236 secs
[Tue Mar 13 08:44:06 2007]:
Test:   Random Write Sectors Extended
LBA Low:0
LBA High:   10
...
Num Cmds:   4277
Num_Failed_Cmds:1
...
Status: Fail [Error 401: Command timeout]

Dmesg log

~ $ dmesg
.77] Calling initcall 0xc001ebb4: inet_diag_init+0x0/0x80()
[42949375.77] Calling initcall 0xc001ec34: tcp_diag_init+0x0/0x1c()
[42949375.77] Calling initcall 0xc001ec50: bictcp_register+0x0/0x1c()
[42949375.77] TCP bic registered
[42949375.77] Calling initcall 0xc001ee2c: af_unix_init+0x0/0x80()
[42949375.77] NET: Registered protocol family 1
[42949375.77] Calling initcall 0xc001eeac: packet_init+0x0/0x70()
[42949375.77] NET: Registered protocol family 17
[42949375.77] Calling initcall 0xc0012a88:
clocksource_done_booting+0x0/0x24()
[42949375.77] Calling initcall 0xc0019ed4: seqgen_init+0x0/0x1c()
[42949375.77] Calling initcall 0xc001ba44:
early_uart_console_switch+0x0/0x90()
[42949375.77] Calling initcall 0xc013a150: net_random_reseed+0x0/0x38()
[42949375.77] RAMDISK: Compressed image found at block 0
[42949378.95] VFS: Mounted root (ext2 filesystem).
[42949378.96] Freeing init memory: 104K
[42949549.17] ata1: soft resetting port
[42949549.25] ata1.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48
[42949549.25] ata1.00: configured for UDMA/100
[42949549.25] ata1: EH complete
[42949549.25]   Vendor: ATA   Model: ST940813AMRev: 3.02
[42949549.25]   Type:   Direct-Access  ANSI
SCSI revision: 05
[42949549.26] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB)
[42949549.26] sda: Write Protect is off
[42949549.26] sda: Mode Sense: 00 3a 00 00
[42949549.26] SCSI device sda: drive cache: write back
[42949549.27] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB)
[42949549.27] sda: Write Protect is off
[42949549.27] sda: Mode Sense: 00 3a 00 00
[42949549.27] SCSI device sda: drive cache: write back
[42949549.27]  sda: unknown partition table
[42949549.29] sd 0:0:0:0: Attached scsi disk sda
[42949549.29] sd 0:0:0:0: Attached scsi generic sg0 type 0
[42949549.32] ata1: soft resetting port
[42949549.38] ata1.00: configured for UDMA/100
[42949549.38] ata1: EH complete
[42949549.38] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB)
[42949549.38] sda: Write Protect is off
[42949549.38] sda: Mode Sense: 00 3a 00 00
[42949549.39] SCSI device sda: drive cache: write back
[42949559.28] ata2: soft resetting port
[42949559.42] ata2.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48
[42949559.42] ata2.00: configured for UDMA/100
[42949559.42] ata2: EH complete
[42949559.42]   Vendor: ATA   Model: ST94811A  Rev: 3.07
[42949559.42]   Type:   Direct-Access  ANSI
SCSI revision: 05
[42949559.43] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB)
[42949559.43] sdb: Write Protect is off
[42949559.43] sdb: Mode Sense: 00 3a 00 00
[42949559.43] SCSI device sdb: drive cache: write back
[42949559.43] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB)
[42949559.44] sdb: Write Protect is off
[42949559.44] sdb: Mode Sense: 00 3a 00 00
[42949559.44] SCSI device sdb: drive cache: write back
[42949559.44]  sdb: unknown partition table
[42949559.46] sd 1:0:0:0: Attached scsi disk sdb
[42949559.46] sd 1:0:0:0: Attached scsi generic sg1 type 0
[  643.23] NWFPE: ntpd[38] takes exception 0001 at c002d514
from 0001d308
[  711.23] NWFPE: ntpd[38] takes exception 0001 at c002d514
from 0001d308
[  777.22] NWFPE: ntpd[38] takes exception 0001 at c002d514
from 0001d308
[  841.30] NWFPE: ntpd[38] takes exception 0001 at c002d514
from

Re: PATA Sil680 Command Timeout on ARM XScale

2007-03-13 Thread Alan Cox

 above 1.5.   Two timers are used to track command timeout in our test
 software.  The one in the user space is set to 6 seconds using alarm()
 call while the one in the kernel (scsi timer) is set to 5 seconds.
 These timeout values are probably too low to be realistic,  but the
 issue here is not about the timeout itself but to understand why it is

A lot of drive commands seem to be set up on a seven second worst case

 always user space timer expired before kernel timer.   Since kernel
 timer uses jiffies to track time, does this imply a kernel bug where
 the time interrupts were lost or delay somehow?  Do you know any know
 problems related to command timeout in PATA Sil680?

Alarm() is also handled by the same jiffies logic, so I suspect a bug in
your test environment ?

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: PATA Sil680 Command Timeout on ARM XScale

2007-03-13 Thread Fajun Chen


On 3/13/07, Alan Cox [EMAIL PROTECTED] wrote:

 above 1.5.   Two timers are used to track command timeout in our test
 software.  The one in the user space is set to 6 seconds using alarm()
 call while the one in the kernel (scsi timer) is set to 5 seconds.
 These timeout values are probably too low to be realistic,  but the
 issue here is not about the timeout itself but to understand why it is

A lot of drive commands seem to be set up on a seven second worst case

 always user space timer expired before kernel timer.   Since kernel
 timer uses jiffies to track time, does this imply a kernel bug where
 the time interrupts were lost or delay somehow?  Do you know any know
 problems related to command timeout in PATA Sil680?

Alarm() is also handled by the same jiffies logic, so I suspect a bug in
your test environment ?



I enabled ata_irq_trap and did the same test again. The kernel timer
caught the timeout (10 seconds) this time along with the irq trap
traces below.  What's the cause of these idle irqs?

[42949560.15] SCSI device sdb: drive cache: write back
[   85.57] ata1: irq trap
[   85.82] ata2: irq trap
[   92.12] abnormal status 0xD0
[   92.12] ata1: irq trap
[   92.92] ata2: irq trap
[   98.75] ata1: irq trap
[  100.26] abnormal status 0xD0
[  100.26] ata2: irq trap
[  105.54] ata1: irq trap
[  108.05] ata1: irq trap
[  110.62] ata1: irq trap
[  113.13] ata1: irq trap
[  115.53] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[  115.53] ata1.00: (BMDMA stat 0x0)
[  115.53] ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
[  115.53] ata1: soft resetting port
[  115.57] ata1.00: configured for UDMA/100
[  115.57] sg_cmd_done: sg0, pack_id=2706, res=0x802, dur=10040 ms
[  115.57] ata1: EH complete
[  115.58] SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
[  115.58] sda: Write Protect is off
[  115.58] sda: Mode Sense: 00 3a 00 00
[  115.58] SCSI device sda: drive cache: write back
...

Thanks,
Fajun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/12] libata: separate out ata_host_alloc() and ata_host_attach()

2007-03-13 Thread Brian King

Tejun Heo wrote:
 Association to SCSI host is done via pointer now even for native ATA
 case, so this should be easier for SAS.  What I'm worried about is how
 EH gets invoked.  libata depends on EH to do a lot of things including
 probing, requesting sense data, etc.  How should this work?

For SAS, the scsi_host pointer in the ata port is NULL today, since libata
is really not managing the scsi host, the LLDD is. I think the initialization
model we want for SAS is a little different than the one you are heading
towards on SATA. For SAS, I think we just want to be able to alloc/init
and delete/destroy a SATA device a they show up on the transport,
without tying it to initialization of the ata host. And this set of
patches doesn't necessarily prevent that...

 SAS attached libata port shares EH with the SAS SCSI host, right?  How can

Right.

 we connect SAS EH with libata EH and would it be okay for libata EH hold
 the SCSI EH (thus holding all command execution on the host) to handle
 ATA exceptions?

Currently, ipr calls ata_do_eh from its eh_device_reset_handler function.
This seems to work well enough with the testing that I've done, but it
would certainly be nice to get to a more layered EH approach, where we
could possibly have pluggable error handlers for different device types.

Regarding holding all command execution on the host while performing eh,
that doesn't seem to be a huge issue from my perspective, not sure if
this would have a larger negative impact on others... Generally speaking,
we shouldn't be entering eh very often, and it should only be happening
if something went wrong. The biggest issue here might be ATAPI devices,
since they tend to report more errors during normal running. The request
sense for these devices for SAS is done without entering eh today. Would
you want to move this into eh as well?

Brian


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: libata Intel PIIX/ICH fails to detect both PATA drives, spews ACPI errors

2007-03-13 Thread Berck E. Nash

Tejun Heo wrote:
 Hello, Berck.
 
 Berck E. Nash wrote:
 Tejun Heo wrote:
 Berck E. Nash wrote:
 Testing the new libata ICH PATA drivers.  There's one PATA port on this
 chip, and I've got two optical drives connected to it.  The master drive
 fails to detect.  The slave detects and works properly.
 Can you test 2.6.20.1 and post full dmesg?
 Here's 2.6.20.2...  No ACPI errors, but still doesn't detect both drives.
 
 Please apply the attached patch and see if it works.  If it works,
 please post the result of hdparm -I /dev/srX of the optical drive.  Thanks.
 

Okay, here ya go:

/dev/sr0:

ATAPI CD-ROM, with removable media
Model Number:   LITE-ON LTR-48246S
Serial Number:
Firmware Revision:  SS0E
Standards:
Used: ATAPI for CD-ROMs, SFF-8020i, r2.5
Supported: CD-ROM ATAPI-2
Configuration:
DRQ response: 50us.
Packet size: 12 bytes
Capabilities:
LBA, IORDY(cannot be disabled)
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2
 Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
 Cycle time: no flow control=227ns  IORDY flow control=120ns


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Asus P5B-VM motherboard: cd drive malfunctions if internal nic in use.

2007-03-13 Thread Lennart Sorensen

On Tue, Mar 13, 2007 at 12:23:06PM -0400, Mark Lord wrote:
 That's nice.
 
 But the P5B-VM board does not have any such jumper for USB,
 nor does it have any obvious combination of BIOS-setup options
 to accomplish it.

Well it could only be done by hardware.  The P5B has those jumpers.  I
figured the P5B-VM while a budget micro board would still have those.  I
guess not.  Without jumper settings for it there is nothing you can do
about it.  A quick look through the manual certainly only mentions
standby power for the keyboard connector and not for the USB ports.

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DVD drive fails in 2.6.20.2

2007-03-13 Thread Tejun Heo

Vlad Codrea wrote:
 Hi,
 
 The DVD-ROM drive on my laptop does not work with the vanilla 2.6.20.2
 kernel using drivers/ata. The attached file dmesg.txt contains the full
 dmesg output including the error messages. I have also attached the
 .config file I used when compiling the kernel. The DVD device does not
 appear under /dev (only /dev/sda shows up, which is the hard drive).
 
 The ATA-related errors seem to start with:
 
 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata2.00: (BMDMA stat 0x25)
 ata2.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x12 data 36 in
 res 58/00:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
 ata2: soft resetting port
 ata2: port is slow to respond, please be patient (Status 0xd8)
 ata2: port failed to respond (30 secs, Status 0xd8)
 ATA: abnormal status 0xD8 on port 0x177
 ATA: abnormal status 0xD8 on port 0x177
 
 I should point out that this DVD drive hasn't worked with drivers/ide
 either, but it works perfectly under Windows 98. For background on this
 bug, please see:
 
 https://bugzilla.novell.com/show_bug.cgi?id=177050
 http://bugzilla.kernel.org/show_bug.cgi?id=6710
 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=197477
 https://launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/50161

To add more info, the drive is...

 Model=TORiSAN DVD-ROM DRD-N216, FwRev=1.08,
SerialNo=0001
 Config={ SpinMotCtl Removeable DTR=5Mbs DTR10Mbs nonMagnetic }
 RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=0kB, MaxMultSect=0
 (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  sdma0 sdma1 sdma2 mdma0 mdma1 *mdma2
 AdvancedPM=no

and as written above it also doesn't work with the ide drivers.  If DMA
is turned off using hdparm -d 0, it seems to work better but still
doesn't seem to work reliably.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DVD drive fails in 2.6.20.2

2007-03-13 Thread Tejun Heo

Vlad Codrea wrote:
 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata2.00: (BMDMA stat 0x25)
 ata2.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x12 data 36 in
 res 58/00:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
 ata2: soft resetting port
 ata2: port is slow to respond, please be patient (Status 0xd8)
 ata2: port failed to respond (30 secs, Status 0xd8)
 ATA: abnormal status 0xD8 on port 0x177
 ATA: abnormal status 0xD8 on port 0x177

Okay, now that you're on libata driver, it's easier for me to debug.
Can you apply the attached patch over 2.6.20 and report what the kernel
says? (the patch will apply with some noise, it's okay)

Thanks.

-- 
tejun
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 14629a3..387235f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4453,9 +4453,13 @@ fsm_start:
 			if (likely(status  (ATA_ERR | ATA_DF)))
 /* device stops HSM for abort/error */
 qc-err_mask |= AC_ERR_DEV;
-			else
+			else {
 /* HSM violation. Let EH handle this */
+ata_port_printk(ap, KERN_WARNING,
+		!DRQ on HSM_ST_FIRST (0x%x)\n,
+		status);
 qc-err_mask |= AC_ERR_HSM;
+			}
 
 			ap-hsm_task_state = HSM_ST_ERR;
 			goto fsm_start;
@@ -4547,13 +4551,17 @@ fsm_start:
 if (likely(status  (ATA_ERR | ATA_DF)))
 	/* device stops HSM for abort/error */
 	qc-err_mask |= AC_ERR_DEV;
-else
+else {
+	ata_port_printk(ap, KERN_WARNING,
+		!DRQ on HSM_ST (0x%x)\n,
+		status);
 	/* HSM violation. Let EH handle this.
 	 * Phantom devices also trigger this
 	 * condition.  Mark hint.
 	 */
 	qc-err_mask |= AC_ERR_HSM |
 			AC_ERR_NODEV_HINT;
+}
 
 ap-hsm_task_state = HSM_ST_ERR;
 goto fsm_start;
@@ -4579,8 +4587,12 @@ fsm_start:
 	status = ata_wait_idle(ap);
 }
 
-if (status  (ATA_BUSY | ATA_DRQ))
+if (status  (ATA_BUSY | ATA_DRQ)) {
+	ata_port_printk(ap, KERN_WARNING,
+		BUSY|DRQ on ERR|DF (0x%x)\n,
+		status);
 	qc-err_mask |= AC_ERR_HSM;
+}
 
 /* ata_pio_sectors() might change the
  * state to HSM_ST_LAST. so, the state

[PATCH] Add ledtrig_ide_activity () to libata

2007-03-13 Thread Nobuhiro Iwamatsu

Hi , all .

I noticed that that ledtrig_ide_activity was not enable in libata. 

I append the patch to correct it. 
Please apply. 

regards ,
 Nobuhiro
-- 
Nobuhiro Iwamatsu
E-Mail : [EMAIL PROTECTED]
GPG ID : 3170EBE9 

Signed-off-by: Nobuhiro Iwamatsu [EMAIL PROTECTED]

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index dc362fa..51930be 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -407,6 +407,7 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
ata_device *dev,
tf-lbah = cyl  8;
tf-device |= head;
}
+   ledtrig_ide_activity();
 
return 0;
 }
diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
index 80acd08..0b99f57 100644
--- a/drivers/leds/Kconfig
+++ b/drivers/leds/Kconfig
@@ -113,7 +113,7 @@ config LEDS_TRIGGER_TIMER
 
 config LEDS_TRIGGER_IDE_DISK
bool LED IDE Disk Trigger
-   depends on LEDS_TRIGGERS  BLK_DEV_IDEDISK
+   depends on LEDS_TRIGGERS  ( BLK_DEV_IDEDISK || ATA)
help
  This allows LEDs to be controlled by IDE disk activity.
  If unsure, say Y.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/12] libata: separate out ata_host_alloc() and ata_host_attach()

2007-03-13 Thread Tejun Heo


Hello, Brian.

Brian King wrote:

Tejun Heo wrote:

Association to SCSI host is done via pointer now even for native ATA
case, so this should be easier for SAS.  What I'm worried about is how
EH gets invoked.  libata depends on EH to do a lot of things including
probing, requesting sense data, etc.  How should this work?


For SAS, the scsi_host pointer in the ata port is NULL today, since libata
is really not managing the scsi host, the LLDD is. I think the initialization
model we want for SAS is a little different than the one you are heading
towards on SATA. For SAS, I think we just want to be able to alloc/init
and delete/destroy a SATA device a they show up on the transport,
without tying it to initialization of the ata host. And this set of
patches doesn't necessarily prevent that...


Yeap, I tried to keep SAS bridge functions working.  If SAS doesn't need 
the host abstraction and wanna do stuff per-port basis, ata_port_alloc() 
can be directly exported and separating out per-port register routine 
shouldn't be too difficult, but I do think it would still be beneficial 
to have ata_host structure in SAS case too for code simplicity if not 
for anything else.



SAS attached libata port shares EH with the SAS SCSI host, right?  How can


Right.


we connect SAS EH with libata EH and would it be okay for libata EH hold
the SCSI EH (thus holding all command execution on the host) to handle
ATA exceptions?


Currently, ipr calls ata_do_eh from its eh_device_reset_handler function.
This seems to work well enough with the testing that I've done, but it
would certainly be nice to get to a more layered EH approach, where we
could possibly have pluggable error handlers for different device types.


That's an unexpected usage of ata_do_eh() but I can see how that works 
and using ata_do_eh() for that purpose actually makes sense.  Most SCSI 
related dancing is done before and after ata_do_eh() and ata_do_eh() 
only deals with ATA qc's (except for scsi_eh_finish_cmd() called to 
finish failed qc's but these are still for only scmds associated with qcs).


In the future, we might need to separate those direct 
scsi_eh_finish_cmd() calls out of ata_do_eh() so that ata_do_eh() really 
deals with libata qc proper but that change shouldn't be too difficult 
for SAS.



Regarding holding all command execution on the host while performing eh,
that doesn't seem to be a huge issue from my perspective, not sure if
this would have a larger negative impact on others... Generally speaking,
we shouldn't be entering eh very often, and it should only be happening
if something went wrong. The biggest issue here might be ATAPI devices,
since they tend to report more errors during normal running. The request
sense for these devices for SAS is done without entering eh today. Would
you want to move this into eh as well?


No, not for SAS.  The reasons why I put sense requesting to EH were...

1. to make fast path code straight forward (no qc reusing dance)

2. in native ATA, we have per-port EH thread so sharing is not a problem.

As #2 is not true in SAS case, I think keeping sense requesting out of 
EH is the right thing to do here.  I still think that it's much 
simpler/reliable to handle any exception case in a separate thread.  I 
think this in the long term should be solved by making EH per-request 
queue (we of course will need mechanism to synchronize several EHs so 
that we can take host-wide EH actions).


Thanks.

--
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] libata: always use polling SETXFER

2007-03-13 Thread Tejun Heo

Several people have reported LITE-ON LTR-48246S detection failed
because SETXFER fails.  It seems the device raises IRQ too early after
SETXFER.  This is controller independent.  The same problem has been
reported for different controllers.

So, now we have pata_via where the controller raises IRQ before it's
ready after SETXFER and a device which does similar thing.  This patch
makes libata always execute SETXFER via polling.  As this only happens
during EH, performance impact is nil.  Setting ATA_TFLAG_POLLING is
also moved from issue hot path to ata_dev_set_xfermode() - the only
place where SETXFER can be issued.

Jeff Garzik suggests that, in the long term, it might be better to
modify libata HSM implementation such that we're more tolerant of
erratic ATAPI IRQ behavior - e.g. default to IRQ but falling back to
polling if the device doesn't seem ready at the point of interrupt.
Such change might be necessary to support ancient/weird ATAPI devices.

Signed-off-by: Tejun Heo [EMAIL PROTECTED]

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 14629a3..3a8da9d 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3575,10 +3575,13 @@ static unsigned int ata_dev_set_xfermode(struct 
ata_device *dev)
/* set up set-features taskfile */
DPRINTK(set features - xfer mode\n);
 
+   /* Some controllers and ATAPI devices show flaky interrupt
+* behavior after setting xfer mode.  Use polling instead.
+*/
ata_tf_init(dev, tf);
tf.command = ATA_CMD_SET_FEATURES;
tf.feature = SETFEATURES_XFER;
-   tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
+   tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_POLLING;
tf.protocol = ATA_PROT_NODATA;
tf.nsect = dev-xfer_mode;
 
@@ -5036,14 +5039,6 @@ unsigned int ata_qc_issue_prot(struct ata_queued_cmd *qc)
}
}
 
-   /* Some controllers show flaky interrupt behavior after
-* setting xfer mode.  Use polling instead.
-*/
-   if (unlikely(qc-tf.command == ATA_CMD_SET_FEATURES 
-qc-tf.feature == SETFEATURES_XFER) 
-   (ap-flags  ATA_FLAG_SETXFER_POLLING))
-   qc-tf.flags |= ATA_TFLAG_POLLING;
-
/* select the device */
ata_dev_select(ap, qc-dev-devno, 1, 0);
 
diff --git a/drivers/ata/pata_via.c b/drivers/ata/pata_via.c
index 96b7179..377e792 100644
--- a/drivers/ata/pata_via.c
+++ b/drivers/ata/pata_via.c
@@ -426,7 +426,7 @@ static int via_init_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
/* Early VIA without UDMA support */
static struct ata_port_info via_mwdma_info = {
.sht = via_sht,
-   .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING,
+   .flags = ATA_FLAG_SLAVE_POSS,
.pio_mask = 0x1f,
.mwdma_mask = 0x07,
.port_ops = via_port_ops
@@ -434,7 +434,7 @@ static int via_init_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
/* Ditto with IRQ masking required */
static struct ata_port_info via_mwdma_info_borked = {
.sht = via_sht,
-   .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING,
+   .flags = ATA_FLAG_SLAVE_POSS,
.pio_mask = 0x1f,
.mwdma_mask = 0x07,
.port_ops = via_port_ops_noirq,
@@ -442,7 +442,7 @@ static int via_init_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
/* VIA UDMA 33 devices (and borked 66) */
static struct ata_port_info via_udma33_info = {
.sht = via_sht,
-   .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING,
+   .flags = ATA_FLAG_SLAVE_POSS,
.pio_mask = 0x1f,
.mwdma_mask = 0x07,
.udma_mask = 0x7,
@@ -451,7 +451,7 @@ static int via_init_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
/* VIA UDMA 66 devices */
static struct ata_port_info via_udma66_info = {
.sht = via_sht,
-   .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING,
+   .flags = ATA_FLAG_SLAVE_POSS,
.pio_mask = 0x1f,
.mwdma_mask = 0x07,
.udma_mask = 0x1f,
@@ -460,7 +460,7 @@ static int via_init_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
/* VIA UDMA 100 devices */
static struct ata_port_info via_udma100_info = {
.sht = via_sht,
-   .flags = ATA_FLAG_SLAVE_POSS | ATA_FLAG_SETXFER_POLLING,
+   .flags = ATA_FLAG_SLAVE_POSS,
.pio_mask = 0x1f,
.mwdma_mask = 0x07,
.udma_mask = 0x3f,
@@ -469,7 +469,7 @@ static int via_init_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
/* UDMA133 with bad AST (All current 133) */
static struct ata_port_info

Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-13 Thread Michal Piotrowski


On 12/03/07, Tejun Heo [EMAIL PROTECTED] wrote:

Stephen Hemminger wrote:
 On Tue, 13 Mar 2007 04:03:00 +0900
 Tejun Heo [EMAIL PROTECTED] wrote:

 Stephen Hemminger wrote:
 1. the controller has IRQ stuck high (infrequent but possible)
 2. the IRQ is already requested by another device
 3. the IRQ gets disabled due to screaming interrupts at the moment
 ata_piix does pci_enable_device().

 I think we can be much more resilient to screaming interrupts if we
 enable device with IRQ disabled and enable it after the device is
 initialized to some level, possibly when requesting IRQ.
 The first thing the skge driver does is do a chip reset, and that should
 cause IRQ to be disabled and cleared. The driver has no chance to
 fix it if the BIOS left the IRQ screaming...
 What if we do something like...

  pci_intx(pdev, 0);
  pci_enable_device(pdev);
  /* initialize */
  request_irq(blah blah...);
  pci_intx(pdev, 1);

 Would this work for skge?


 Okay for testing, but any change like this should be done in the base
 PCI layer, not one off in a particular driver.

Yeap, it was a proof-of-concept pseudo code.  I attached a patch to do
above in skge.  Please point out if it is broken (e.g. intx needs to be
enabled earlier).

Michal, can you apply the attached patch and see whether it fixes the
problem.


I think that problem is solved.

Thanks.



Thanks.

--
tejun

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index eea75a4..2c990f2 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -3585,6 +3585,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
struct skge_hw *hw;
int err, using_dac = 0;

+   pci_intx(pdev, 0);
err = pci_enable_device(pdev);
if (err) {
dev_err(pdev-dev, cannot enable PCI device\n);
@@ -3669,6 +3670,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
   dev-name, pdev-irq);
goto err_out_unregister;
}
+   pci_intx(pdev, 1);
skge_show_addr(dev);

if (hw-ports  1  (dev1 = skge_devinit(hw, 1, using_dac))) {




Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [5/6] 2.6.21-rc3: known regressions

2007-03-13 Thread Pavel Machek

Hi!

 This email lists some known regressions in Linus' tree compared to 2.6.20.
 
 If you find your name in the Cc header, you are either submitter of one
 of the bugs, maintainer of an affectected subsystem or driver, a patch
 of you caused a breakage or I'm considering you in any other way
 possibly involved with one or more of these issues.
 
 Due to the huge amount of recipients, please trim the Cc when answering.
 
 
 Subject: resume: slab error in verify_redzone_free(): cache `size-512':
  memory outside object was overwritten
 References : http://lkml.org/lkml/2007/2/24/41
 Submitter  : Pavel Machek [EMAIL PROTECTED]
 Status : unknown
 
 
 Subject: beeps get longer after suspend
 References : http://lkml.org/lkml/2007/2/26/276
 Submitter  : Pavel Machek [EMAIL PROTECTED]
 Status : unknown

Seems fixed in -rc3.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3/6] 2.6.21-rc3: known regressions

2007-03-13 Thread Fabio Comolli


On 3/13/07, Alan Cox [EMAIL PROTECTED] wrote:

 Subject: libata: PATA UDMA/100 configured as UDMA/33
 References : http://lkml.org/lkml/2007/2/20/294
  
http://www.mail-archive.com/linux-ide@vger.kernel.org/msg04115.html
  http://bugzilla.kernel.org/show_bug.cgi?id=8133
  http://bugzilla.kernel.org/show_bug.cgi?id=8164
 Submitter  : Fabio Comolli [EMAIL PROTECTED]
  Plamen Petrov [EMAIL PROTECTED]
  Laurent Riffard [EMAIL PROTECTED]
 Handled-By : Tejun Heo [EMAIL PROTECTED]
 Status : patch available

Some cases should be fixed now but probably not all (eg the Nvidia one)



This regression is still present in 2.6.21-rc3-g8b9909de (pulled from
Linus' tree less than one hour ago).

Fabio
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3/6] 2.6.21-rc2: known regressions

2007-03-13 Thread Tejun Heo

Hello,

Mathieu Bérard wrote:
 [   15.031823] ata1.00: taskfile_load_raw: (0x1f1-1f7): hex: 10 03 00 00
 00 a0 ef

Okay, this is interesting.  This is Enable Device-Initiated Interface
Power State Transitions.  So, after this command is executed the device
will try to transit to partial/slumber SATA PHY power states at its
discretion, which is all cool and dandy in theory but depending on
controller and drive firmware can cause all sorts of problems.

The NCQ problem you're seeing probably is some side effect of device
initiated link PS.  Can't tell whether the controller or the drive's
firmware is problem without further info.  Due to blacklisting, NCQ
won't be turned on your drive in future kernels and link PS doesn't seem
to cause any problem no non-NCQ, so your case is taken care of here but
this leaves me a bit worried about what _GTF feeds us.

I don't think we can reliably filter out command TFs as it might even
contain vendor-specific commands but it might be better to always log
TFs executed for _GTF such that we at least know what's going on with
the drive.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

37 matches

Mail list logo