Re: libata pm
Hi everybody, it looks like this will be a never ending story... After I got the new PSU and the raid was in full sync without any error for 48h, I thought all problems were gone. Today the sata errors reappeared and whenever the load is high enough I get the following: I exchanged two (probably failing) of the eight harddrives with new ones. All remaining disks have a good smart state and are fully readable when the raid is not active. As long as I mount the filesystem on the raid readonly there wont happen any error, but the moment I mount it rw and try to copy something to the fs on the raid I get the already known timeout. At least I get a little bit desperate now... ata10.00: failed to read SCR 1 (Emask=0x40) ata10.01: failed to read SCR 1 (Emask=0x40) ata10.02: failed to read SCR 1 (Emask=0x40) ata10.03: failed to read SCR 1 (Emask=0x40) ata10.04: failed to read SCR 1 (Emask=0x40) ata10.05: failed to read SCR 1 (Emask=0x40) ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.01: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 1 cdb 0x0 data 0 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata10.01: status: { DRDY } ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.15: hard resetting link ata10.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata10.00: hard resetting link ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.01: hard resetting link ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.02: hard resetting link ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.03: hard resetting link ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.04: hard resetting link ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.05: hard resetting link ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata10.00: configured for UDMA/100 ata10.01: configured for UDMA/100 ata10.03: configured for UDMA/100 ata10.04: configured for UDMA/100 ata10: EH complete sd 9:0:0:0: [sdl] 976773168 512-byte hardware sectors (500108 MB) sd 9:0:0:0: [sdl] Write Protect is off sd 9:0:0:0: [sdl] Mode Sense: 00 3a 00 00 sd 9:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:1:0:0: [sdm] 976773168 512-byte hardware sectors (500108 MB) sd 9:1:0:0: [sdm] Write Protect is off sd 9:1:0:0: [sdm] Mode Sense: 00 3a 00 00 sd 9:1:0:0: [sdm] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:2:0:0: [sdn] 976773168 512-byte hardware sectors (500108 MB) sd 9:2:0:0: [sdn] Write Protect is off sd 9:2:0:0: [sdn] Mode Sense: 00 3a 00 00 sd 9:2:0:0: [sdn] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:3:0:0: [sdo] 976773168 512-byte hardware sectors (500108 MB) sd 9:3:0:0: [sdo] Write Protect is off sd 9:3:0:0: [sdo] Mode Sense: 00 3a 00 00 sd 9:3:0:0: [sdo] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:4:0:0: [sdp] 976773168 512-byte hardware sectors (500108 MB) sd 9:4:0:0: [sdp] Write Protect is off sd 9:4:0:0: [sdp] Mode Sense: 00 3a 00 00 sd 9:4:0:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libata pm
[EMAIL PROTECTED] wrote: Sorry for my late answer, but i had to sort this out first. After replacing the first PSU with a new Corsair 650W the power no longer fluctuated more than 0,01 V (and this only when booting up the drives...) I did a full resync on both raid arrays and got no more errors or resets, but there were some inconsitencies during sync and the xfs filesystem on both arrays had to be repaired. Are these problems caused by the pm resets ? libata EH won't lose any data as long as the hardware doesn't. If power fluctuates causing your drive to briefly power down - this does happen and you can hear the drive doing emergency unload when it happens, the data in write buffer can be lost. On coming back, all that libata can know is that the PHY suffered brief connection loss, so it resets the device and goes on, so the data in the cache is lost now. It's basically pulling the power plug from the harddrive while write is going on and connecting it back quickly. You're bound to lose data. After I got the new PSU and the raid was in full sync without any error for 48h, I thought all problems were gone. Today the sata errors reappeared and whenever the load is high enough I get the following: .. What exact brand/model drives are those again (hdparm --Istdout, please) ? If I have a similar unit here, I may try to reproduce this. Cheers - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libata pm
[EMAIL PROTECTED] wrote: Sorry for my late answer, but i had to sort this out first. After replacing the first PSU with a new Corsair 650W the power no longer fluctuated more than 0,01 V (and this only when booting up the drives...) I did a full resync on both raid arrays and got no more errors or resets, but there were some inconsitencies during sync and the xfs filesystem on both arrays had to be repaired. Are these problems caused by the pm resets ? libata EH won't lose any data as long as the hardware doesn't. If power fluctuates causing your drive to briefly power down - this does happen and you can hear the drive doing emergency unload when it happens, the data in write buffer can be lost. On coming back, all that libata can know is that the PHY suffered brief connection loss, so it resets the device and goes on, so the data in the cache is lost now. It's basically pulling the power plug from the harddrive while write is going on and connecting it back quickly. You're bound to lose data. After I got the new PSU and the raid was in full sync without any error for 48h, I thought all problems were gone. Today the sata errors reappeared and whenever the load is high enough I get the following: ata10.02: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata10.02: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2 cdb 0x0 data 0 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Those are time outs for FLUSH. libata told the drive to flush its cache onto the media and the drive failed to finish that in 30secs. As FLUSH itself is a non-data command, this problem usually isn't caused by transport layer problems. This is a pretty good clue that the drive is knee-deep in s**t. What does 'smartctl -a' tell you about the drive? -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libata pm
Sorry for my late answer, but i had to sort this out first. After replacing the first PSU with a new Corsair 650W the power no longer fluctuated more than 0,01 V (and this only when booting up the drives...) I did a full resync on both raid arrays and got no more errors or resets, but there were some inconsitencies during sync and the xfs filesystem on both arrays had to be repaired. Are these problems caused by the pm resets ? libata EH won't lose any data as long as the hardware doesn't. If power fluctuates causing your drive to briefly power down - this does happen and you can hear the drive doing emergency unload when it happens, the data in write buffer can be lost. On coming back, all that libata can know is that the PHY suffered brief connection loss, so it resets the device and goes on, so the data in the cache is lost now. It's basically pulling the power plug from the harddrive while write is going on and connecting it back quickly. You're bound to lose data. After I got the new PSU and the raid was in full sync without any error for 48h, I thought all problems were gone. Today the sata errors reappeared and whenever the load is high enough I get the following: ata10.00: failed to read SCR 1 (Emask=0x40) ata10.01: failed to read SCR 1 (Emask=0x40) ata10.02: failed to read SCR 1 (Emask=0x40) ata10.03: failed to read SCR 1 (Emask=0x40) ata10.04: failed to read SCR 1 (Emask=0x40) ata10.05: failed to read SCR 1 (Emask=0x40) ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.02: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata10.02: status: { DRDY } ata10.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.15: hard resetting link ata10.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata10.00: hard resetting link ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.01: hard resetting link ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.02: hard resetting link ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.03: hard resetting link ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.04: hard resetting link ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.05: hard resetting link ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata10.00: configured for UDMA/100 ata10.01: configured for UDMA/100 ata10.02: configured for UDMA/100 ata10.03: configured for UDMA/100 ata10.04: configured for UDMA/100 ata10: EH complete sd 9:0:0:0: [sdl] 976773168 512-byte hardware sectors (500108 MB) sd 9:0:0:0: [sdl] Write Protect is off sd 9:0:0:0: [sdl] Mode Sense: 00 3a 00 00 sd 9:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:1:0:0: [sdm] 976773168 512-byte hardware sectors (500108 MB) sd 9:1:0:0: [sdm] Write Protect is off sd 9:1:0:0: [sdm] Mode Sense: 00 3a 00 00 sd 9:1:0:0: [sdm] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:2:0:0: [sdn] 976773168 512-byte hardware sectors (500108 MB) sd 9:2:0:0: [sdn] Write Protect is off sd 9:2:0:0: [sdn] Mode Sense: 00 3a 00 00 sd 9:2:0:0: [sdn] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:3:0:0: [sdo] 976773168 512-byte hardware sectors (500108 MB) sd 9:3:0:0: [sdo] Write Protect is off sd 9:3:0:0: [sdo] Mode Sense: 00 3a 00 00 sd 9:3:0:0: [sdo] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:4:0:0: [sdp] 976773168 512-byte hardware sectors (500108 MB) sd 9:4:0:0: [sdp] Write Protect is off sd 9:4:0:0: [sdp] Mode Sense: 00 3a 00 00 sd 9:4:0:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:0:0:0: [sdl] 976773168 512-byte hardware sectors (500108 MB) sd 9:0:0:0: [sdl] Write Protect is off sd 9:0:0:0: [sdl] Mode Sense: 00 3a 00 00 sd 9:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:1:0:0: [sdm] 976773168 512-byte hardware sectors (500108 MB) sd 9:1:0:0: [sdm] Write Protect is off sd 9:1:0:0: [sdm] Mode Sense: 00 3a 00 00 sd 9:1:0:0: [sdm] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:2:0:0: [sdn] 976773168 512-byte hardware sectors (500108 MB) sd 9:2:0:0: [sdn] Write Protect is off sd 9:2:0:0: [sdn] Mode Sense: 00 3a 00 00 sd 9:2:0:0: [sdn] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:3:0:0: [sdo] 976773168 512-byte hardware sectors (500108 MB) sd 9:3:0:0: [sdo] Write Protect is off sd 9:3:0:0: [sdo] Mode Sense: 00 3a 00 00 sd 9:3:0:0: [sdo] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:4:0:0: [sdp]
Re: libata pm
Mark Lord wrote: After I got the new PSU and the raid was in full sync without any error for 48h, I thought all problems were gone. Today the sata errors reappeared and whenever the load is high enough I get the following: .. What exact brand/model drives are those again (hdparm --Istdout, please) ? If I have a similar unit here, I may try to reproduce this. Dusty, can you please provide the info for Mark? Let's see if Mark can reproduce this. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libata pm
Shuffling the drives did not change anything to the linkspeed of the 3 ports running with 1.5 Gbps. Looks like the problem is port-related. Hmm... Okay. Maybe the signal traces or connectors have some problems. But 3.0Gbps on downstream port doesn't make any difference anyway so unless it leads to errors, it should be fine. The first PSU (Multilane) offers (measured) 12,20 V on both lanes. This falls during read/write access on all drives attached down to 12,10 V. That explains the PHY dropouts. I've even seen drives to do hard reset accompanying emergency head unload due to voltage dropping when high IO load hits. Of course, data in cache is lost. The second PSU (Singlelane) offers (measured) 11,75 V. This doesn't change during read/write access on all drives attached. I tested the second PSU without anything attached and it still offered 11,75 V. 11.75 is still in spec and it doesn't fluctuate. I like this power much better. I will try to add another PSU this evening and remeasure. Currently the whole machine needs about 350W. The PSUs are Targan 500W (Multilane - 2x12V 10A) Maybe one of the lane is only connected to video power connectors? IIRC, that was the idea of multilane power anyway. Please lemme know your test result. Sorry for my late answer, but i had to sort this out first. After replacing the first PSU with a new Corsair 650W the power no longer fluctuated more than 0,01 V (and this only when booting up the drives...) I did a full resync on both raid arrays and got no more errors or resets, but there were some inconsitencies during sync and the xfs filesystem on both arrays had to be repaired. Are these problems caused by the pm resets ? Thanks for your patience. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libata pm
[EMAIL PROTECTED] wrote: Sorry for my late answer, but i had to sort this out first. After replacing the first PSU with a new Corsair 650W the power no longer fluctuated more than 0,01 V (and this only when booting up the drives...) I did a full resync on both raid arrays and got no more errors or resets, but there were some inconsitencies during sync and the xfs filesystem on both arrays had to be repaired. Are these problems caused by the pm resets ? libata EH won't lose any data as long as the hardware doesn't. If power fluctuates causing your drive to briefly power down - this does happen and you can hear the drive doing emergency unload when it happens, the data in write buffer can be lost. On coming back, all that libata can know is that the PHY suffered brief connection loss, so it resets the device and goes on, so the data in the cache is lost now. It's basically pulling the power plug from the harddrive while write is going on and connecting it back quickly. You're bound to lose data. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libata pm
Hello, Am Mo, 28.01.2008, 01:17, schrieb Tejun Heo: Hello, [EMAIL PROTECTED] wrote: With one, two and three drives on the pm I got no errors, so I tried to change the power supply. I got two power supplies for the 16 harddrives and the second one (with all Maxtor drives and the first pm) was failing. After changing the power supply all problems where gone. Sorry for the confusion. It's amazing that, in SATA, PSU-problem-verified / PSU-problem-suggested ratio is significant. I don't think this ever was the case with PATA. SATA link seems much more susceptible to power quality and allows hooking up whole lot of drives. The only remaining issue is that some sata links are only working with 1.5 gbps and I can't figure out why. (ata3,4,5,6.x are the Maxtor drives and ata7,8,9,10.x are the Samsung ones) Hmm... That's how the hardware negotiated transfer rate w/ each other. SControl is telling the hardware that there's no speed limit and to go as high as it can but somehow 3.0Gbps negotiation failed and the link speed is limited to 1.5Gbps for some of the ports. Is the result always the same? What happens if you swap drives between ports? Also, if you have an extra PSU lying around, can you hook it up with some of the drives such that the load is more distributed. Oh.. Another note on PSU. I don't know whether this still holds for more recent ones but mid-to-high range multi-lane PSUs sometimes have lesser juice on 12v rail available for disks than low cost single lane PSUs. This is because high power 12v lanes are allocated to power video cards and disks can only pull power from (usually) single weak 12v lane. Thanks. -- tejun Shuffling the drives did not change anything to the linkspeed of the 3 ports running with 1.5 Gbps. Looks like the problem is port-related. The first PSU (Multilane) offers (measured) 12,20 V on both lanes. This falls during read/write access on all drives attached down to 12,10 V. The second PSU (Singlelane) offers (measured) 11,75 V. This doesn't change during read/write access on all drives attached. I tested the second PSU without anything attached and it still offered 11,75 V. I will try to add another PSU this evening and remeasure. Currently the whole machine needs about 350W. The PSUs are Targan 500W (Multilane - 2x12V 10A) and Noname 350W (Singlelane - 1x12V 10A) so this 'should' be enough... This night i copied the backup back to the first raid (the Maxtor drives) without any error but during the transfer i got this on the second (mounted but unused) raid: ata10.00: failed to read SCR 1 (Emask=0x40) ata10.01: failed to read SCR 1 (Emask=0x40) ata10.02: failed to read SCR 1 (Emask=0x40) ata10.03: failed to read SCR 1 (Emask=0x40) ata10.04: failed to read SCR 1 (Emask=0x40) ata10.05: failed to read SCR 1 (Emask=0x40) ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.02: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 1 cdb 0x0 data 0 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata10.02: status: { DRDY } ata10.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.15: hard resetting link ata10.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata10.00: hard resetting link ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.01: hard resetting link ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.02: hard resetting link ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.03: hard resetting link ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.04: hard resetting link ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.05: hard resetting link ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata10.00: configured for UDMA/100 ata10.01: configured for UDMA/100 ata10.02: configured for UDMA/100 ata10.03: configured for UDMA/100 ata10.04: configured for UDMA/100 ata10: EH complete sd 9:0:0:0: [sdp] 976773168 512-byte hardware sectors (500108 MB) sd 9:0:0:0: [sdp] Write Protect is off sd 9:0:0:0: [sdp] Mode Sense: 00 3a 00 00 sd 9:0:0:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:1:0:0: [sdq] 976773168 512-byte hardware sectors (500108 MB) sd 9:1:0:0: [sdq] Write Protect is off sd 9:1:0:0: [sdq] Mode Sense: 00 3a 00 00 sd 9:1:0:0: [sdq] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:2:0:0: [sdr] 976773168 512-byte hardware sectors (500108 MB) sd 9:2:0:0: [sdr] Write Protect is off sd 9:2:0:0: [sdr] Mode Sense: 00 3a 00 00 sd 9:2:0:0: [sdr] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 9:3:0:0: [sds] 976773168 512-byte hardware sectors
Re: libata pm
Hello, Dusty. [EMAIL PROTECTED] wrote: Shuffling the drives did not change anything to the linkspeed of the 3 ports running with 1.5 Gbps. Looks like the problem is port-related. Hmm... Okay. Maybe the signal traces or connectors have some problems. But 3.0Gbps on downstream port doesn't make any difference anyway so unless it leads to errors, it should be fine. The first PSU (Multilane) offers (measured) 12,20 V on both lanes. This falls during read/write access on all drives attached down to 12,10 V. That explains the PHY dropouts. I've even seen drives to do hard reset accompanying emergency head unload due to voltage dropping when high IO load hits. Of course, data in cache is lost. The second PSU (Singlelane) offers (measured) 11,75 V. This doesn't change during read/write access on all drives attached. I tested the second PSU without anything attached and it still offered 11,75 V. 11.75 is still in spec and it doesn't fluctuate. I like this power much better. I will try to add another PSU this evening and remeasure. Currently the whole machine needs about 350W. The PSUs are Targan 500W (Multilane - 2x12V 10A) Maybe one of the lane is only connected to video power connectors? IIRC, that was the idea of multilane power anyway. and Noname 350W (Singlelane - 1x12V 10A) so this 'should' be enough... This night i copied the backup back to the first raid (the Maxtor drives) without any error but during the transfer i got this on the second (mounted but unused) raid: ata10.00: failed to read SCR 1 (Emask=0x40) ata10.01: failed to read SCR 1 (Emask=0x40) ata10.02: failed to read SCR 1 (Emask=0x40) ata10.03: failed to read SCR 1 (Emask=0x40) ata10.04: failed to read SCR 1 (Emask=0x40) ata10.05: failed to read SCR 1 (Emask=0x40) ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata10.02: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 1 cdb 0x0 data 0 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Timeout during heavy IO can be caused from a number of things and bad power is one of them. Please lemme know your test result. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libata pm
Am So, 27.01.2008, 01:38, schrieb [EMAIL PROTECTED]: Am Sa, 26.01.2008, 23:33, schrieb Tejun Heo: [EMAIL PROTECTED] wrote: I could solve the problem by ncq blacklisting the Maxtor 6V300F0 devices in libata-core.c That's a strange story. The reported error was PHY readiness changed and Device exchanged, both of which point to hotplug event. Maybe something causes the firmware to reset? What happen if you only leave two drives on the port multiplier and remove the NCQ blacklist? Also, does the error ever occur on the drives which are connected directly to the controller? -- tejun No the error only occures on the drives on the pm. I can test the pm with two Maxtor drives tomorrow (currently backing up the raid). With one, two and three drives on the pm I got no errors, so I tried to change the power supply. I got two power supplies for the 16 harddrives and the second one (with all Maxtor drives and the first pm) was failing. After changing the power supply all problems where gone. Sorry for the confusion. The only remaining issue is that some sata links are only working with 1.5 gbps and I can't figure out why. (ata3,4,5,6.x are the Maxtor drives and ata7,8,9,10.x are the Samsung ones) jasmin ~ # dmesg | grep Gbps ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata6.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300) - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
libata pm
I'm currently running Linux *** 2.6.23 #1 SMP PREEMPT Sat Jan 26 17:59:58 CET 2008 x86_64 Intel(R) Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux with the libata-tj-2.6.23-20071011 path for 2.6.23 and for testing purposes linux-2.6.24-rc6-mm1 on a Asus P5WDG2-WS with two PCI-X Dawicontrol DC 4300 Controllers and two Dawicontrol DC 6510 PM port multipliers to get 16 sata ports. (the linux partition is on a ide drive) the second raid array with 8 Samsung HD-LJ 501 works great without any errors. but the first raid with 8 Maxtor 6V300F0 produces one error after another after mounting the filesystem: Filesystem dm-0: Disabling barriers, not supported by the underlying device XFS mounting filesystem dm-0 Ending clean XFS mount for filesystem: dm-0 Filesystem dm-1: Disabling barriers, not supported by the underlying device XFS mounting filesystem dm-1 Ending clean XFS mount for filesystem: dm-1 ata6.04: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0xb ata6: SError: { PHYRdyChg DevExch } ata6.04: hard resetting link ata6.04: SATA link down (SStatus 0 SControl 300) ata6: failed to recover some devices, retrying in 5 secs ata6.04: hard resetting link ata6.04: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.04: configured for UDMA/100 ata6: EH complete sd 5:0:0:0: [sdh] 586114704 512-byte hardware sectors (300091 MB) sd 5:0:0:0: [sdh] Write Protect is off sd 5:0:0:0: [sdh] Mode Sense: 00 3a 00 00 sd 5:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:1:0:0: [sdi] 586114704 512-byte hardware sectors (300091 MB) sd 5:1:0:0: [sdi] Write Protect is off sd 5:1:0:0: [sdi] Mode Sense: 00 3a 00 00 sd 5:1:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:2:0:0: [sdj] 586114704 512-byte hardware sectors (300091 MB) sd 5:2:0:0: [sdj] Write Protect is off sd 5:2:0:0: [sdj] Mode Sense: 00 3a 00 00 sd 5:2:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:3:0:0: [sdk] 586114704 512-byte hardware sectors (300091 MB) sd 5:3:0:0: [sdk] Write Protect is off sd 5:3:0:0: [sdk] Mode Sense: 00 3a 00 00 sd 5:3:0:0: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:4:0:0: [sdl] 586114704 512-byte hardware sectors (300091 MB) sd 5:4:0:0: [sdl] Write Protect is off sd 5:4:0:0: [sdl] Mode Sense: 00 3a 00 00 sd 5:4:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata6.00: failed to read SCR 1 (Emask=0x40) ata6.01: failed to read SCR 1 (Emask=0x40) ata6.02: failed to read SCR 1 (Emask=0x40) ata6.03: failed to read SCR 1 (Emask=0x40) ata6.04: failed to read SCR 1 (Emask=0x40) ata6.05: failed to read SCR 1 (Emask=0x40) ata6.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.01: cmd c8/00:10:97:26:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in res 50/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata6.01: status: { DRDY } ata6.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.15: hard resetting link ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata6.00: hard resetting link ata6.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.01: hard resetting link ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.02: hard resetting link ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.03: hard resetting link ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.04: hard resetting link ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.05: hard resetting link ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.00: configured for UDMA/100 ata6.01: configured for UDMA/100 ata6.02: configured for UDMA/100 ata6.03: configured for UDMA/100 ata6.04: configured for UDMA/100 ata6: EH complete sd 5:0:0:0: [sdh] 586114704 512-byte hardware sectors (300091 MB) sd 5:0:0:0: [sdh] Write Protect is off sd 5:0:0:0: [sdh] Mode Sense: 00 3a 00 00 sd 5:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:1:0:0: [sdi] 586114704 512-byte hardware sectors (300091 MB) sd 5:1:0:0: [sdi] Write Protect is off sd 5:1:0:0: [sdi] Mode Sense: 00 3a 00 00 sd 5:1:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:2:0:0: [sdj] 586114704 512-byte hardware sectors (300091 MB) sd 5:2:0:0: [sdj] Write Protect is off sd 5:2:0:0: [sdj] Mode Sense: 00 3a 00 00 sd 5:2:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:3:0:0: [sdk] 586114704 512-byte hardware sectors (300091 MB) sd 5:3:0:0: [sdk] Write Protect is off sd 5:3:0:0: [sdk] Mode Sense: 00 3a 00 00 sd 5:3:0:0: [sdk] Write cache: enabled, read cache: enabled,
Re: libata pm
I could solve the problem by ncq blacklisting the Maxtor 6V300F0 devices in libata-core.c Am Sa, 26.01.2008, 18:03, schrieb [EMAIL PROTECTED]: I'm currently running Linux *** 2.6.23 #1 SMP PREEMPT Sat Jan 26 17:59:58 CET 2008 x86_64 Intel(R) Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux with the libata-tj-2.6.23-20071011 path for 2.6.23 and for testing purposes linux-2.6.24-rc6-mm1 on a Asus P5WDG2-WS with two PCI-X Dawicontrol DC 4300 Controllers and two Dawicontrol DC 6510 PM port multipliers to get 16 sata ports. (the linux partition is on a ide drive) the second raid array with 8 Samsung HD-LJ 501 works great without any errors. but the first raid with 8 Maxtor 6V300F0 produces one error after another after mounting the filesystem: Filesystem dm-0: Disabling barriers, not supported by the underlying device XFS mounting filesystem dm-0 Ending clean XFS mount for filesystem: dm-0 Filesystem dm-1: Disabling barriers, not supported by the underlying device XFS mounting filesystem dm-1 Ending clean XFS mount for filesystem: dm-1 ata6.04: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0xb ata6: SError: { PHYRdyChg DevExch } ata6.04: hard resetting link ata6.04: SATA link down (SStatus 0 SControl 300) ata6: failed to recover some devices, retrying in 5 secs ata6.04: hard resetting link ata6.04: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.04: configured for UDMA/100 ata6: EH complete sd 5:0:0:0: [sdh] 586114704 512-byte hardware sectors (300091 MB) sd 5:0:0:0: [sdh] Write Protect is off sd 5:0:0:0: [sdh] Mode Sense: 00 3a 00 00 sd 5:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:1:0:0: [sdi] 586114704 512-byte hardware sectors (300091 MB) sd 5:1:0:0: [sdi] Write Protect is off sd 5:1:0:0: [sdi] Mode Sense: 00 3a 00 00 sd 5:1:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:2:0:0: [sdj] 586114704 512-byte hardware sectors (300091 MB) sd 5:2:0:0: [sdj] Write Protect is off sd 5:2:0:0: [sdj] Mode Sense: 00 3a 00 00 sd 5:2:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:3:0:0: [sdk] 586114704 512-byte hardware sectors (300091 MB) sd 5:3:0:0: [sdk] Write Protect is off sd 5:3:0:0: [sdk] Mode Sense: 00 3a 00 00 sd 5:3:0:0: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:4:0:0: [sdl] 586114704 512-byte hardware sectors (300091 MB) sd 5:4:0:0: [sdl] Write Protect is off sd 5:4:0:0: [sdl] Mode Sense: 00 3a 00 00 sd 5:4:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata6.00: failed to read SCR 1 (Emask=0x40) ata6.01: failed to read SCR 1 (Emask=0x40) ata6.02: failed to read SCR 1 (Emask=0x40) ata6.03: failed to read SCR 1 (Emask=0x40) ata6.04: failed to read SCR 1 (Emask=0x40) ata6.05: failed to read SCR 1 (Emask=0x40) ata6.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.01: cmd c8/00:10:97:26:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in res 50/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata6.01: status: { DRDY } ata6.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.15: hard resetting link ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) ata6.00: hard resetting link ata6.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.01: hard resetting link ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.02: hard resetting link ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.03: hard resetting link ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.04: hard resetting link ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.05: hard resetting link ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.00: configured for UDMA/100 ata6.01: configured for UDMA/100 ata6.02: configured for UDMA/100 ata6.03: configured for UDMA/100 ata6.04: configured for UDMA/100 ata6: EH complete sd 5:0:0:0: [sdh] 586114704 512-byte hardware sectors (300091 MB) sd 5:0:0:0: [sdh] Write Protect is off sd 5:0:0:0: [sdh] Mode Sense: 00 3a 00 00 sd 5:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:1:0:0: [sdi] 586114704 512-byte hardware sectors (300091 MB) sd 5:1:0:0: [sdi] Write Protect is off sd 5:1:0:0: [sdi] Mode Sense: 00 3a 00 00 sd 5:1:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:2:0:0: [sdj] 586114704 512-byte hardware sectors (300091 MB) sd 5:2:0:0: [sdj] Write Protect is off sd 5:2:0:0: [sdj] Mode Sense: 00 3a 00 00 sd 5:2:0:0: [sdj] Write cache: enabled, read cache: enabled,
Re: libata pm
Am Sa, 26.01.2008, 23:33, schrieb Tejun Heo: [EMAIL PROTECTED] wrote: I could solve the problem by ncq blacklisting the Maxtor 6V300F0 devices in libata-core.c That's a strange story. The reported error was PHY readiness changed and Device exchanged, both of which point to hotplug event. Maybe something causes the firmware to reset? What happen if you only leave two drives on the port multiplier and remove the NCQ blacklist? Also, does the error ever occur on the drives which are connected directly to the controller? -- tejun No the error only occures on the drives on the pm. I can test the pm with two Maxtor drives tomorrow (currently backing up the raid). - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html