Re: libata pm

2008-02-05 Thread [EMAIL PROTECTED]
Hi everybody,

it looks like this will be a never ending story...

 After I got the new PSU and the raid was in full sync without any
 error for 48h, I thought all problems were gone. Today the sata errors
  reappeared and whenever the load is high enough I get the following:

I exchanged two (probably failing) of the eight harddrives with new ones.
All remaining disks have a good smart state and are fully readable when
the raid is not active. As long as I mount the filesystem on the raid
readonly there wont happen any error, but the moment I mount it rw and try
to copy something to the fs on the raid I get the already known timeout.
At least I get a little bit desperate now...

ata10.00: failed to read SCR 1 (Emask=0x40)
ata10.01: failed to read SCR 1 (Emask=0x40)
ata10.02: failed to read SCR 1 (Emask=0x40)
ata10.03: failed to read SCR 1 (Emask=0x40)
ata10.04: failed to read SCR 1 (Emask=0x40)
ata10.05: failed to read SCR 1 (Emask=0x40)
ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.01: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 1 cdb 0x0 data 0
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata10.01: status: { DRDY }
ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.15: hard resetting link
ata10.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata10.00: hard resetting link
ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.01: hard resetting link
ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.02: hard resetting link
ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.03: hard resetting link
ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.04: hard resetting link
ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.05: hard resetting link
ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata10.00: configured for UDMA/100
ata10.01: configured for UDMA/100
ata10.03: configured for UDMA/100
ata10.04: configured for UDMA/100
ata10: EH complete
sd 9:0:0:0: [sdl] 976773168 512-byte hardware sectors (500108 MB)
sd 9:0:0:0: [sdl] Write Protect is off
sd 9:0:0:0: [sdl] Mode Sense: 00 3a 00 00
sd 9:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:1:0:0: [sdm] 976773168 512-byte hardware sectors (500108 MB)
sd 9:1:0:0: [sdm] Write Protect is off
sd 9:1:0:0: [sdm] Mode Sense: 00 3a 00 00
sd 9:1:0:0: [sdm] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:2:0:0: [sdn] 976773168 512-byte hardware sectors (500108 MB)
sd 9:2:0:0: [sdn] Write Protect is off
sd 9:2:0:0: [sdn] Mode Sense: 00 3a 00 00
sd 9:2:0:0: [sdn] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:3:0:0: [sdo] 976773168 512-byte hardware sectors (500108 MB)
sd 9:3:0:0: [sdo] Write Protect is off
sd 9:3:0:0: [sdo] Mode Sense: 00 3a 00 00
sd 9:3:0:0: [sdo] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:4:0:0: [sdp] 976773168 512-byte hardware sectors (500108 MB)
sd 9:4:0:0: [sdp] Write Protect is off
sd 9:4:0:0: [sdp] Mode Sense: 00 3a 00 00
sd 9:4:0:0: [sdp] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata pm

2008-02-01 Thread Mark Lord

[EMAIL PROTECTED] wrote:

Sorry for my late answer, but i had to sort this out first.
After replacing the first PSU with a new Corsair 650W the power no
longer fluctuated more than 0,01 V (and this only when booting up the
drives...) I did a full resync on both raid arrays and got no more
errors or resets, but there were some inconsitencies during sync and the
xfs filesystem on both arrays had to be repaired. Are these problems
caused by the pm resets ?


libata EH won't lose any data as long as the hardware doesn't.  If power
fluctuates causing your drive to briefly power down - this does happen and
you can hear the drive doing emergency unload when it happens, the data in
write buffer can be lost.  On coming back, all that libata can know is
that the PHY suffered brief connection loss, so it resets the device and
goes on, so the data in the cache is lost now.  It's basically pulling the
power plug from the harddrive while write is going on and connecting it
back quickly.  You're bound to lose data.


After I got the new PSU and the raid was in full sync without any error
for 48h, I thought all problems were gone. Today the sata errors
reappeared and whenever the load is high enough I get the following:

..

What exact brand/model drives are those again (hdparm --Istdout, please) ?

If I have a similar unit here, I may try to reproduce this.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata pm

2008-02-01 Thread [EMAIL PROTECTED]
 Sorry for my late answer, but i had to sort this out first.
 After replacing the first PSU with a new Corsair 650W the power no
 longer fluctuated more than 0,01 V (and this only when booting up the
 drives...) I did a full resync on both raid arrays and got no more
 errors or resets, but there were some inconsitencies during sync and the
 xfs filesystem on both arrays had to be repaired. Are these problems
 caused by the pm resets ?


 libata EH won't lose any data as long as the hardware doesn't.  If power
 fluctuates causing your drive to briefly power down - this does happen and
 you can hear the drive doing emergency unload when it happens, the data in
 write buffer can be lost.  On coming back, all that libata can know is
 that the PHY suffered brief connection loss, so it resets the device and
 goes on, so the data in the cache is lost now.  It's basically pulling the
 power plug from the harddrive while write is going on and connecting it
 back quickly.  You're bound to lose data.

After I got the new PSU and the raid was in full sync without any error
for 48h, I thought all problems were gone. Today the sata errors
reappeared and whenever the load is high enough I get the following:

ata10.00: failed to read SCR 1 (Emask=0x40)
ata10.01: failed to read SCR 1 (Emask=0x40)
ata10.02: failed to read SCR 1 (Emask=0x40)
ata10.03: failed to read SCR 1 (Emask=0x40)
ata10.04: failed to read SCR 1 (Emask=0x40)
ata10.05: failed to read SCR 1 (Emask=0x40)
ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.02: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata10.02: status: { DRDY }
ata10.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.15: hard resetting link
ata10.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata10.00: hard resetting link
ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.01: hard resetting link
ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.02: hard resetting link
ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.03: hard resetting link
ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.04: hard resetting link
ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.05: hard resetting link
ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata10.00: configured for UDMA/100
ata10.01: configured for UDMA/100
ata10.02: configured for UDMA/100
ata10.03: configured for UDMA/100
ata10.04: configured for UDMA/100
ata10: EH complete
sd 9:0:0:0: [sdl] 976773168 512-byte hardware sectors (500108 MB)
sd 9:0:0:0: [sdl] Write Protect is off
sd 9:0:0:0: [sdl] Mode Sense: 00 3a 00 00
sd 9:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:1:0:0: [sdm] 976773168 512-byte hardware sectors (500108 MB)
sd 9:1:0:0: [sdm] Write Protect is off
sd 9:1:0:0: [sdm] Mode Sense: 00 3a 00 00
sd 9:1:0:0: [sdm] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:2:0:0: [sdn] 976773168 512-byte hardware sectors (500108 MB)
sd 9:2:0:0: [sdn] Write Protect is off
sd 9:2:0:0: [sdn] Mode Sense: 00 3a 00 00
sd 9:2:0:0: [sdn] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:3:0:0: [sdo] 976773168 512-byte hardware sectors (500108 MB)
sd 9:3:0:0: [sdo] Write Protect is off
sd 9:3:0:0: [sdo] Mode Sense: 00 3a 00 00
sd 9:3:0:0: [sdo] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:4:0:0: [sdp] 976773168 512-byte hardware sectors (500108 MB)
sd 9:4:0:0: [sdp] Write Protect is off
sd 9:4:0:0: [sdp] Mode Sense: 00 3a 00 00
sd 9:4:0:0: [sdp] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:0:0:0: [sdl] 976773168 512-byte hardware sectors (500108 MB)
sd 9:0:0:0: [sdl] Write Protect is off
sd 9:0:0:0: [sdl] Mode Sense: 00 3a 00 00
sd 9:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:1:0:0: [sdm] 976773168 512-byte hardware sectors (500108 MB)
sd 9:1:0:0: [sdm] Write Protect is off
sd 9:1:0:0: [sdm] Mode Sense: 00 3a 00 00
sd 9:1:0:0: [sdm] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:2:0:0: [sdn] 976773168 512-byte hardware sectors (500108 MB)
sd 9:2:0:0: [sdn] Write Protect is off
sd 9:2:0:0: [sdn] Mode Sense: 00 3a 00 00
sd 9:2:0:0: [sdn] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:3:0:0: [sdo] 976773168 512-byte hardware sectors (500108 MB)
sd 9:3:0:0: [sdo] Write Protect is off
sd 9:3:0:0: [sdo] Mode Sense: 00 3a 00 00
sd 9:3:0:0: [sdo] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:4:0:0: [sdp] 

Re: libata pm

2008-02-01 Thread Tejun Heo
Mark Lord wrote:
 After I got the new PSU and the raid was in full sync without any error
 for 48h, I thought all problems were gone. Today the sata errors
 reappeared and whenever the load is high enough I get the following:
 ..
 
 What exact brand/model drives are those again (hdparm --Istdout, please) ?
 
 If I have a similar unit here, I may try to reproduce this.

Dusty, can you please provide the info for Mark?  Let's see if Mark can
reproduce this.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata pm

2008-01-30 Thread [EMAIL PROTECTED]
 Shuffling the drives did not change anything to the linkspeed of the 3
 ports running with 1.5 Gbps. Looks like the problem is port-related.

 Hmm... Okay.  Maybe the signal traces or connectors have some problems.
 But 3.0Gbps on downstream port doesn't make any difference anyway so
 unless it leads to errors, it should be fine.

 The first PSU (Multilane) offers (measured) 12,20 V on both lanes. This
  falls during read/write access on all drives attached down to 12,10 V.


 That explains the PHY dropouts.  I've even seen drives to do hard reset
 accompanying emergency head unload due to voltage dropping when high IO
 load hits.  Of course, data in cache is lost.

 The second PSU (Singlelane) offers (measured) 11,75 V. This doesn't
 change during read/write access on all drives attached. I tested the
 second PSU without anything attached and it still offered 11,75 V.

 11.75 is still in spec and it doesn't fluctuate.  I like this power much
 better.

 I will try to add another PSU this evening and remeasure.
 Currently the whole machine needs about 350W.
 The PSUs are Targan 500W (Multilane - 2x12V 10A)


 Maybe one of the lane is only connected to video power connectors?
 IIRC, that was the idea of multilane power anyway.


 Please lemme know your test result.

Sorry for my late answer, but i had to sort this out first.
After replacing the first PSU with a new Corsair 650W the power no longer
fluctuated more than 0,01 V (and this only when booting up the drives...)
I did a full resync on both raid arrays and got no more errors or resets,
but there were some inconsitencies during sync and the xfs filesystem on
both arrays had to be repaired. Are these problems caused by the pm resets
?

Thanks for your patience.

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata pm

2008-01-30 Thread Tejun Heo
[EMAIL PROTECTED] wrote:
 Sorry for my late answer, but i had to sort this out first.
 After replacing the first PSU with a new Corsair 650W the power no longer
 fluctuated more than 0,01 V (and this only when booting up the drives...)
 I did a full resync on both raid arrays and got no more errors or resets,
 but there were some inconsitencies during sync and the xfs filesystem on
 both arrays had to be repaired. Are these problems caused by the pm resets
 ?

libata EH won't lose any data as long as the hardware doesn't.  If power
fluctuates causing your drive to briefly power down - this does happen
and you can hear the drive doing emergency unload when it happens, the
data in write buffer can be lost.  On coming back, all that libata can
know is that the PHY suffered brief connection loss, so it resets the
device and goes on, so the data in the cache is lost now.  It's
basically pulling the power plug from the harddrive while write is going
on and connecting it back quickly.  You're bound to lose data.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata pm

2008-01-28 Thread [EMAIL PROTECTED]
Hello,

Am Mo, 28.01.2008, 01:17, schrieb Tejun Heo:
 Hello,


 [EMAIL PROTECTED] wrote:
 With one, two and three drives on the pm I got no errors, so I tried to
  change the power supply. I got two power supplies for the 16
 harddrives and the second one (with all Maxtor drives and the first pm)
 was failing. After changing the power supply all problems where gone.
 Sorry for the
 confusion.

 It's amazing that, in SATA, PSU-problem-verified / PSU-problem-suggested
 ratio is significant.  I don't think this ever was the case with PATA. SATA
 link seems much more susceptible to power quality and allows hooking up
 whole lot of drives.


 The only remaining issue is that some sata links are only working with
 1.5
 gbps and I can't figure out why. (ata3,4,5,6.x are the Maxtor drives and
  ata7,8,9,10.x are the Samsung ones)


 Hmm... That's how the hardware negotiated transfer rate w/ each other.
 SControl is telling the hardware that there's no speed limit and to go
 as high as it can but somehow 3.0Gbps negotiation failed and the link speed
 is limited to 1.5Gbps for some of the ports.  Is the result always the
 same?  What happens if you swap drives between ports?  Also, if you have
 an extra PSU lying around, can you hook it up with some of the drives such
 that the load is more distributed.

 Oh.. Another note on PSU.  I don't know whether this still holds for
 more recent ones but mid-to-high range multi-lane PSUs sometimes have
 lesser juice on 12v rail available for disks than low cost single lane
 PSUs.  This is because high power 12v lanes are allocated to power video
 cards and disks can only pull power from (usually) single weak 12v lane.

 Thanks.


 --
 tejun



Shuffling the drives did not change anything to the linkspeed of the 3
ports running with 1.5 Gbps. Looks like the problem is port-related.
The first PSU (Multilane) offers (measured) 12,20 V on both lanes. This
falls during read/write access on all drives attached down to 12,10 V.
The second PSU (Singlelane) offers (measured) 11,75 V. This doesn't change
during read/write access on all drives attached. I tested the second PSU
without anything attached and it still offered 11,75 V.

I will try to add another PSU this evening and remeasure.
Currently the whole machine needs about 350W.
The PSUs are Targan 500W (Multilane - 2x12V 10A) and Noname 350W
(Singlelane - 1x12V 10A) so this 'should' be enough...

This night i copied the backup back to the first raid (the Maxtor drives)
without any error but during the transfer i got this on the second
(mounted but unused) raid:
ata10.00: failed to read SCR 1 (Emask=0x40)
ata10.01: failed to read SCR 1 (Emask=0x40)
ata10.02: failed to read SCR 1 (Emask=0x40)
ata10.03: failed to read SCR 1 (Emask=0x40)
ata10.04: failed to read SCR 1 (Emask=0x40)
ata10.05: failed to read SCR 1 (Emask=0x40)
ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.02: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 1 cdb 0x0 data 0
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata10.02: status: { DRDY }
ata10.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.15: hard resetting link
ata10.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata10.00: hard resetting link
ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.01: hard resetting link
ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.02: hard resetting link
ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.03: hard resetting link
ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.04: hard resetting link
ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.05: hard resetting link
ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata10.00: configured for UDMA/100
ata10.01: configured for UDMA/100
ata10.02: configured for UDMA/100
ata10.03: configured for UDMA/100
ata10.04: configured for UDMA/100
ata10: EH complete
sd 9:0:0:0: [sdp] 976773168 512-byte hardware sectors (500108 MB)
sd 9:0:0:0: [sdp] Write Protect is off
sd 9:0:0:0: [sdp] Mode Sense: 00 3a 00 00
sd 9:0:0:0: [sdp] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:1:0:0: [sdq] 976773168 512-byte hardware sectors (500108 MB)
sd 9:1:0:0: [sdq] Write Protect is off
sd 9:1:0:0: [sdq] Mode Sense: 00 3a 00 00
sd 9:1:0:0: [sdq] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:2:0:0: [sdr] 976773168 512-byte hardware sectors (500108 MB)
sd 9:2:0:0: [sdr] Write Protect is off
sd 9:2:0:0: [sdr] Mode Sense: 00 3a 00 00
sd 9:2:0:0: [sdr] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 9:3:0:0: [sds] 976773168 512-byte hardware sectors 

Re: libata pm

2008-01-28 Thread Tejun Heo
Hello, Dusty.

[EMAIL PROTECTED] wrote:
 Shuffling the drives did not change anything to the linkspeed of the 3
 ports running with 1.5 Gbps. Looks like the problem is port-related.

Hmm... Okay.  Maybe the signal traces or connectors have some problems.
 But 3.0Gbps on downstream port doesn't make any difference anyway so
unless it leads to errors, it should be fine.

 The first PSU (Multilane) offers (measured) 12,20 V on both lanes. This
 falls during read/write access on all drives attached down to 12,10 V.

That explains the PHY dropouts.  I've even seen drives to do hard reset
accompanying emergency head unload due to voltage dropping when high IO
load hits.  Of course, data in cache is lost.

 The second PSU (Singlelane) offers (measured) 11,75 V. This doesn't change
 during read/write access on all drives attached. I tested the second PSU
 without anything attached and it still offered 11,75 V.

11.75 is still in spec and it doesn't fluctuate.  I like this power much
better.

 I will try to add another PSU this evening and remeasure.
 Currently the whole machine needs about 350W.
 The PSUs are Targan 500W (Multilane - 2x12V 10A)

Maybe one of the lane is only connected to video power connectors?
IIRC, that was the idea of multilane power anyway.

 and Noname 350W (Singlelane - 1x12V 10A) so this 'should' be enough...
 
 This night i copied the backup back to the first raid (the Maxtor drives)
 without any error but during the transfer i got this on the second
 (mounted but unused) raid:
 ata10.00: failed to read SCR 1 (Emask=0x40)
 ata10.01: failed to read SCR 1 (Emask=0x40)
 ata10.02: failed to read SCR 1 (Emask=0x40)
 ata10.03: failed to read SCR 1 (Emask=0x40)
 ata10.04: failed to read SCR 1 (Emask=0x40)
 ata10.05: failed to read SCR 1 (Emask=0x40)
 ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata10.02: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 1 cdb 0x0 data 0
  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Timeout during heavy IO can be caused from a number of things and bad
power is one of them.

Please lemme know your test result.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata pm

2008-01-27 Thread [EMAIL PROTECTED]
Am So, 27.01.2008, 01:38, schrieb [EMAIL PROTECTED]:
 Am Sa, 26.01.2008, 23:33, schrieb Tejun Heo:

 [EMAIL PROTECTED] wrote:
 I could solve the problem by ncq blacklisting the Maxtor 6V300F0
 devices in libata-core.c

 That's a strange story.  The reported error was PHY readiness changed
 and Device exchanged, both of which point to hotplug event.  Maybe
 something causes the firmware to reset?  What happen if you only leave
 two drives on the port multiplier and remove the NCQ blacklist?  Also,
 does the error ever occur on the drives which are connected directly to
 the controller?


 --
 tejun



 No the error only occures on the drives on the pm. I can test the pm with
  two Maxtor drives tomorrow (currently backing up the raid).



With one, two and three drives on the pm I got no errors, so I tried to
change the power supply. I got two power supplies for the 16 harddrives
and the second one (with all Maxtor drives and the first pm) was failing.
After changing the power supply all problems where gone. Sorry for the
confusion.

The only remaining issue is that some sata links are only working with 1.5
gbps and I can't figure out why. (ata3,4,5,6.x are the Maxtor drives and
ata7,8,9,10.x are the Samsung ones)

jasmin ~ # dmesg | grep Gbps
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata6.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


libata pm

2008-01-26 Thread [EMAIL PROTECTED]
I'm currently running

Linux *** 2.6.23 #1 SMP PREEMPT Sat Jan 26 17:59:58 CET 2008 x86_64
Intel(R) Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux
with the libata-tj-2.6.23-20071011 path for 2.6.23

and for testing purposes

linux-2.6.24-rc6-mm1


on a Asus P5WDG2-WS with two PCI-X Dawicontrol DC 4300 Controllers and two
Dawicontrol DC 6510 PM port multipliers to get 16 sata ports. (the linux
partition is on a ide drive)

the second raid array with 8 Samsung HD-LJ 501 works great without any
errors. but the first raid with 8 Maxtor 6V300F0 produces one error after
another after mounting the filesystem:

Filesystem dm-0: Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-0
Ending clean XFS mount for filesystem: dm-0
Filesystem dm-1: Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-1
Ending clean XFS mount for filesystem: dm-1
ata6.04: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0xb
ata6: SError: { PHYRdyChg DevExch }
ata6.04: hard resetting link
ata6.04: SATA link down (SStatus 0 SControl 300)
ata6: failed to recover some devices, retrying in 5 secs
ata6.04: hard resetting link
ata6.04: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.04: configured for UDMA/100
ata6: EH complete
sd 5:0:0:0: [sdh] 586114704 512-byte hardware sectors (300091 MB)
sd 5:0:0:0: [sdh] Write Protect is off
sd 5:0:0:0: [sdh] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 5:1:0:0: [sdi] 586114704 512-byte hardware sectors (300091 MB)
sd 5:1:0:0: [sdi] Write Protect is off
sd 5:1:0:0: [sdi] Mode Sense: 00 3a 00 00
sd 5:1:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 5:2:0:0: [sdj] 586114704 512-byte hardware sectors (300091 MB)
sd 5:2:0:0: [sdj] Write Protect is off
sd 5:2:0:0: [sdj] Mode Sense: 00 3a 00 00
sd 5:2:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 5:3:0:0: [sdk] 586114704 512-byte hardware sectors (300091 MB)
sd 5:3:0:0: [sdk] Write Protect is off
sd 5:3:0:0: [sdk] Mode Sense: 00 3a 00 00
sd 5:3:0:0: [sdk] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 5:4:0:0: [sdl] 586114704 512-byte hardware sectors (300091 MB)
sd 5:4:0:0: [sdl] Write Protect is off
sd 5:4:0:0: [sdl] Mode Sense: 00 3a 00 00
sd 5:4:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
ata6.00: failed to read SCR 1 (Emask=0x40)
ata6.01: failed to read SCR 1 (Emask=0x40)
ata6.02: failed to read SCR 1 (Emask=0x40)
ata6.03: failed to read SCR 1 (Emask=0x40)
ata6.04: failed to read SCR 1 (Emask=0x40)
ata6.05: failed to read SCR 1 (Emask=0x40)
ata6.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.01: cmd c8/00:10:97:26:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
 res 50/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata6.01: status: { DRDY }
ata6.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.15: hard resetting link
ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata6.00: hard resetting link
ata6.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.01: hard resetting link
ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.02: hard resetting link
ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.03: hard resetting link
ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.04: hard resetting link
ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.05: hard resetting link
ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.00: configured for UDMA/100
ata6.01: configured for UDMA/100
ata6.02: configured for UDMA/100
ata6.03: configured for UDMA/100
ata6.04: configured for UDMA/100
ata6: EH complete
sd 5:0:0:0: [sdh] 586114704 512-byte hardware sectors (300091 MB)
sd 5:0:0:0: [sdh] Write Protect is off
sd 5:0:0:0: [sdh] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 5:1:0:0: [sdi] 586114704 512-byte hardware sectors (300091 MB)
sd 5:1:0:0: [sdi] Write Protect is off
sd 5:1:0:0: [sdi] Mode Sense: 00 3a 00 00
sd 5:1:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 5:2:0:0: [sdj] 586114704 512-byte hardware sectors (300091 MB)
sd 5:2:0:0: [sdj] Write Protect is off
sd 5:2:0:0: [sdj] Mode Sense: 00 3a 00 00
sd 5:2:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 5:3:0:0: [sdk] 586114704 512-byte hardware sectors (300091 MB)
sd 5:3:0:0: [sdk] Write Protect is off
sd 5:3:0:0: [sdk] Mode Sense: 00 3a 00 00
sd 5:3:0:0: [sdk] Write cache: enabled, read cache: enabled, 

Re: libata pm

2008-01-26 Thread [EMAIL PROTECTED]
I could solve the problem by ncq blacklisting the Maxtor 6V300F0 devices
in libata-core.c


Am Sa, 26.01.2008, 18:03, schrieb [EMAIL PROTECTED]:
 I'm currently running


 Linux *** 2.6.23 #1 SMP PREEMPT Sat Jan 26 17:59:58 CET 2008 x86_64
 Intel(R) Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux
 with the libata-tj-2.6.23-20071011 path for 2.6.23

 and for testing purposes

 linux-2.6.24-rc6-mm1


 on a Asus P5WDG2-WS with two PCI-X Dawicontrol DC 4300 Controllers and
 two Dawicontrol DC 6510 PM port multipliers to get 16 sata ports. (the
 linux partition is on a ide drive)

 the second raid array with 8 Samsung HD-LJ 501 works great without any
 errors. but the first raid with 8 Maxtor 6V300F0 produces one error after
  another after mounting the filesystem:

 Filesystem dm-0: Disabling barriers, not supported by the underlying
 device XFS mounting filesystem dm-0
 Ending clean XFS mount for filesystem: dm-0
 Filesystem dm-1: Disabling barriers, not supported by the underlying
 device XFS mounting filesystem dm-1
 Ending clean XFS mount for filesystem: dm-1
 ata6.04: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0xb
 ata6: SError: { PHYRdyChg DevExch }
 ata6.04: hard resetting link
 ata6.04: SATA link down (SStatus 0 SControl 300)
 ata6: failed to recover some devices, retrying in 5 secs
 ata6.04: hard resetting link
 ata6.04: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata6.04: configured for UDMA/100
 ata6: EH complete
 sd 5:0:0:0: [sdh] 586114704 512-byte hardware sectors (300091 MB) sd
 5:0:0:0: [sdh] Write Protect is off
 sd 5:0:0:0: [sdh] Mode Sense: 00 3a 00 00 sd 5:0:0:0: [sdh] Write cache:
 enabled, read cache: enabled, doesn't support DPO or FUA sd 5:1:0:0: [sdi]
 586114704 512-byte hardware sectors (300091 MB)
 sd 5:1:0:0: [sdi] Write Protect is off sd 5:1:0:0: [sdi] Mode Sense: 00 3a
 00 00
 sd 5:1:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't
 support DPO or FUA sd 5:2:0:0: [sdj] 586114704 512-byte hardware sectors
 (300091 MB)
 sd 5:2:0:0: [sdj] Write Protect is off sd 5:2:0:0: [sdj] Mode Sense: 00 3a
 00 00
 sd 5:2:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't
 support DPO or FUA sd 5:3:0:0: [sdk] 586114704 512-byte hardware sectors
 (300091 MB)
 sd 5:3:0:0: [sdk] Write Protect is off sd 5:3:0:0: [sdk] Mode Sense: 00 3a
 00 00
 sd 5:3:0:0: [sdk] Write cache: enabled, read cache: enabled, doesn't
 support DPO or FUA sd 5:4:0:0: [sdl] 586114704 512-byte hardware sectors
 (300091 MB)
 sd 5:4:0:0: [sdl] Write Protect is off sd 5:4:0:0: [sdl] Mode Sense: 00 3a
 00 00
 sd 5:4:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't
 support DPO or FUA ata6.00: failed to read SCR 1 (Emask=0x40)
 ata6.01: failed to read SCR 1 (Emask=0x40)
 ata6.02: failed to read SCR 1 (Emask=0x40)
 ata6.03: failed to read SCR 1 (Emask=0x40)
 ata6.04: failed to read SCR 1 (Emask=0x40)
 ata6.05: failed to read SCR 1 (Emask=0x40)
 ata6.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata6.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata6.01: cmd c8/00:10:97:26:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192
 in res 50/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata6.01:
 status: { DRDY }
 ata6.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata6.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata6.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata6.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata6.15: hard resetting link
 ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
 ata6.00: hard resetting link
 ata6.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata6.01: hard resetting link
 ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata6.02: hard resetting link
 ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata6.03: hard resetting link
 ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata6.04: hard resetting link
 ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata6.05: hard resetting link
 ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata6.00: configured for UDMA/100
 ata6.01: configured for UDMA/100
 ata6.02: configured for UDMA/100
 ata6.03: configured for UDMA/100
 ata6.04: configured for UDMA/100
 ata6: EH complete
 sd 5:0:0:0: [sdh] 586114704 512-byte hardware sectors (300091 MB) sd
 5:0:0:0: [sdh] Write Protect is off
 sd 5:0:0:0: [sdh] Mode Sense: 00 3a 00 00 sd 5:0:0:0: [sdh] Write cache:
 enabled, read cache: enabled, doesn't support DPO or FUA sd 5:1:0:0: [sdi]
 586114704 512-byte hardware sectors (300091 MB)
 sd 5:1:0:0: [sdi] Write Protect is off sd 5:1:0:0: [sdi] Mode Sense: 00 3a
 00 00
 sd 5:1:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't
 support DPO or FUA sd 5:2:0:0: [sdj] 586114704 512-byte hardware sectors
 (300091 MB)
 sd 5:2:0:0: [sdj] Write Protect is off sd 5:2:0:0: [sdj] Mode Sense: 00 3a
 00 00
 sd 5:2:0:0: [sdj] Write cache: enabled, read cache: enabled, 

Re: libata pm

2008-01-26 Thread [EMAIL PROTECTED]
Am Sa, 26.01.2008, 23:33, schrieb Tejun Heo:
 [EMAIL PROTECTED] wrote:
 I could solve the problem by ncq blacklisting the Maxtor 6V300F0
 devices in libata-core.c

 That's a strange story.  The reported error was PHY readiness changed
 and Device exchanged, both of which point to hotplug event.  Maybe
 something causes the firmware to reset?  What happen if you only leave two
 drives on the port multiplier and remove the NCQ blacklist?  Also, does
 the error ever occur on the drives which are connected directly to the
 controller?


 --
 tejun



No the error only occures on the drives on the pm. I can test the pm with
two Maxtor drives tomorrow (currently backing up the raid).



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html