Re: Linux 2.6.24 sata_promise SATA300TX4 problems

2008-01-27 Thread Peter Favrholdt

Hi Mikael,

Thanks!

It works perfectly at 1.5Gbps :-)

I think that because it fails at 3Gbps it might eventually fail at 
1.5Gbps also... And the error handling was not robust (in my setup with 
these drives at 3Gbps).


If I can help investigate further, I'm happy to do that. I have a spare 
controller card so I could try swapping them.  Do you have any 
suggestions for what I should try next?


Best regards,

Peter

Mikael Pettersson wrote:

Peter Favrholdt writes:
 > If it is not too much of a hassle, could you please make a 1.5Gbps patch 
 > for 2.6.24 for me to try out? If it solves the problem (without me ever 
 > touching the cables) we know for sure it is speed-related and not due to 
 > kernel version.


No problem. I had intended to drop that patch after 2.6.24-rc8 as it
ought to be obsolete, but then again it might not be. It's available here:



-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.24 sata_promise SATA300TX4 problems

2008-01-26 Thread Mikael Pettersson
Peter Favrholdt writes:
 > Hi Mikael,
 > 
 > Thanks for your reply :-)
 > 
 > Mikael Pettersson wrote:
 > > Mysterious. What you have there is a transmission error between the
 > > controller and the disk, which is bad in and by itself, but then there's
 > > a sequence of COMRESETs that fail to bring the port or disk back to life.
 > > 
 > > The original error is not a driver error but something caused by your
 > > system, be it a dodgy cable, a poorly seated cable, or electrical
 > > interference. But the failed COMRESETs is a concern as I've seen them
 > > in other reports as well.
 > 
 > Maybe I should try switching cables (again). Or it could be a 
 > motherboard issue (NFORCE2)?
 > 
 > > Me worried ...
 > > 
 > > So going back to 2.6.21-rc2 makes the system stable again? Can you do some
 > > more testing to see at what point the system becomes less stable? I.e.,
 > > 2.6.21-rcI, 2.6.22, 2.6.22-rcJ, 2.6.23, or 2.6.24-rcJ?
 > 
 > I believe the important part is your 1.5Gbps patch which I applied to 
 > 2.6.21-rc2. Maybe the reason for being stable is that the transmission 
 > error will not show up at that speed - thus not having anything to do 
 > with the kernel version. I'm quite sure the problem is there using 
 > 2.6.21-rc2 at 3Gbps.
 > 
 > > FWIW, I just completed some testing of a 300 TX4 card with kernel 2.6.24,
 > > including dd:s, fscks, mkfs:s, and copying about 400GB of data from one 
 > > drive
 > > (Samsung) to another (Seagate 7200.10) on that card, and I cannot seem to 
 > > break it.
 > 
 > I believe it only happens if I stress all four drives simultanously. So 
 > maybe the transmission error is somehow related to the overall stress of 
 > the PCI bus/card/chip/whatever?
 > 
 > If it is not too much of a hassle, could you please make a 1.5Gbps patch 
 > for 2.6.24 for me to try out? If it solves the problem (without me ever 
 > touching the cables) we know for sure it is speed-related and not due to 
 > kernel version.

No problem. I had intended to drop that patch after 2.6.24-rc8 as it
ought to be obsolete, but then again it might not be. It's available here:

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.24 sata_promise SATA300TX4 problems

2008-01-26 Thread Peter Favrholdt

Hi Mikael,

Thanks for your reply :-)

Mikael Pettersson wrote:

Mysterious. What you have there is a transmission error between the
controller and the disk, which is bad in and by itself, but then there's
a sequence of COMRESETs that fail to bring the port or disk back to life.

The original error is not a driver error but something caused by your
system, be it a dodgy cable, a poorly seated cable, or electrical
interference. But the failed COMRESETs is a concern as I've seen them
in other reports as well.


Maybe I should try switching cables (again). Or it could be a 
motherboard issue (NFORCE2)?



Me worried ...

So going back to 2.6.21-rc2 makes the system stable again? Can you do some
more testing to see at what point the system becomes less stable? I.e.,
2.6.21-rcI, 2.6.22, 2.6.22-rcJ, 2.6.23, or 2.6.24-rcJ?


I believe the important part is your 1.5Gbps patch which I applied to 
2.6.21-rc2. Maybe the reason for being stable is that the transmission 
error will not show up at that speed - thus not having anything to do 
with the kernel version. I'm quite sure the problem is there using 
2.6.21-rc2 at 3Gbps.



FWIW, I just completed some testing of a 300 TX4 card with kernel 2.6.24,
including dd:s, fscks, mkfs:s, and copying about 400GB of data from one drive
(Samsung) to another (Seagate 7200.10) on that card, and I cannot seem to break 
it.


I believe it only happens if I stress all four drives simultanously. So 
maybe the transmission error is somehow related to the overall stress of 
the PCI bus/card/chip/whatever?


If it is not too much of a hassle, could you please make a 1.5Gbps patch 
for 2.6.24 for me to try out? If it solves the problem (without me ever 
touching the cables) we know for sure it is speed-related and not due to 
kernel version.


Still strange that the com resets does not help though (but maybe this 
is the drive which locks up?) :-/


Best regards,

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.24 sata_promise SATA300TX4 problems

2008-01-26 Thread Mikael Pettersson
Peter Favrholdt writes:
 > Hi Mikael & list,
 > 
 > I have previously reported problems with my setup:
 > 
 > SATA300TX4 + 4 Seagate Barracuda ES 500GB
 > 
 > I just tested with 2.6.24. After copying approx 25GB of each drive using
 >   dd if=/dev/sd[abcd] of=/dev/null bs=1M
 > sda failed with the following message:
 > 
 > [ 1060.069489] ata1: SError: { 10B8B Dispar BadCRC TrStaTrns }
 > [ 1060.069498] ata1.00: cmd 25/00:00:90:2c:e6/00:02:01:00:00/e0 tag 0 
 > dma 262144 in
 > [ 1060.069501]  res 40/00:28:00:00:00/00:00:00:00:00/40 Emask 
 > 0x4 (timeout)
 > 
 > I have included lspci and dmesg output below.
 > 
 > My system is rock solid using 2.6.21-rc2 with Mikael Pettersons 1.5Gbps 
 > patch.
...
 > [ 1060.069478] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x138 
 > action 0x2 frozen
 > [ 1060.069489] ata1: SError: { 10B8B Dispar BadCRC TrStaTrns }
 > [ 1060.069498] ata1.00: cmd 25/00:00:90:2c:e6/00:02:01:00:00/e0 tag 0 
 > dma 262144 in
 > [ 1060.069501]  res 40/00:28:00:00:00/00:00:00:00:00/40 Emask 
 > 0x4 (timeout)
 > [ 1060.069505] ata1.00: status: { DRDY }
 > [ 1065.437567] ata1: port is slow to respond, please be patient (Status 
 > 0xff)
 > [ 1070.114210] ata1: device not ready (errno=-16), forcing hardreset
 > [ 1070.114219] ata1: hard resetting link
 > [ 1076.320932] ata1: port is slow to respond, please be patient (Status 
 > 0xff)
 > [ 1080.158924] ata1: COMRESET failed (errno=-16)

Mysterious. What you have there is a transmission error between the
controller and the disk, which is bad in and by itself, but then there's
a sequence of COMRESETs that fail to bring the port or disk back to life.

The original error is not a driver error but something caused by your
system, be it a dodgy cable, a poorly seated cable, or electrical
interference. But the failed COMRESETs is a concern as I've seen them
in other reports as well.

Me worried ...

So going back to 2.6.21-rc2 makes the system stable again? Can you do some
more testing to see at what point the system becomes less stable? I.e.,
2.6.21-rcI, 2.6.22, 2.6.22-rcJ, 2.6.23, or 2.6.24-rcJ?

FWIW, I just completed some testing of a 300 TX4 card with kernel 2.6.24,
including dd:s, fscks, mkfs:s, and copying about 400GB of data from one drive
(Samsung) to another (Seagate 7200.10) on that card, and I cannot seem to break 
it.
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html