Re: SATA problems
Andrew Morton wrote: > On Thu, 30 Aug 2007 09:24:18 + Nigel Kukard <[EMAIL PROTECTED]> wrote: > >> Hrmmm, >> > > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > > > 0x0001c807 > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > > > 0x0001c807 > > Unrelated to the other error, but I've been meaning to ask for a while.. > If this is 'abnormal', why does every SATA box I've seen do it? *crickets* It's removed (finally). :-) -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Andrew Morton wrote: On Thu, 30 Aug 2007 09:24:18 + Nigel Kukard [EMAIL PROTECTED] wrote: Hrmmm, Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Unrelated to the other error, but I've been meaning to ask for a while.. If this is 'abnormal', why does every SATA box I've seen do it? *crickets* It's removed (finally). :-) -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
On Thu, 30 Aug 2007 09:24:18 + Nigel Kukard <[EMAIL PROTECTED]> wrote: > Hrmmm, > > >> > > >> > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > >> > > > 0x0001c807 > >> > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > >> > > > 0x0001c807 > >> > > >> > Unrelated to the other error, but I've been meaning to ask for a while.. > >> > If this is 'abnormal', why does every SATA box I've seen do it? > >> > >> *crickets* chirp, chirp. > >> Should we check for this case explicitly, and not print this? > >> > >> > >> > > After I get the above errors, my entire SATA bus crashes and I need to > > hard reset the box ... not sure we can just ignore the errors? > > > > > > Appears even with the patch provided a few months ago I'm getting > freezes. Replaced the HDD & all cables, same errors ... especially > whilst doing heavy IO. > > Can anyone shed some light? > I think I was told last week that copying the appropriate mailing list will at least prevent chirping, so let's try that. Original thread here: http://lkml.org/lkml/2007/6/14/154 > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data > 131072 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ATA: abnormal status 0x7F on port 0x0001c807 > ATA: abnormal status 0x7F on port 0x0001c807 > ata2.00: configured for UDMA/133 > ata2: EH complete > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data > 131072 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ATA: abnormal status 0x7F on port 0x0001c807 > ATA: abnormal status 0x7F on port 0x0001c807 > ata2.00: configured for UDMA/133 > ata2: EH complete > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data > 131072 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ATA: abnormal status 0x7F on port 0x0001c807 > ATA: abnormal status 0x7F on port 0x0001c807 > ata2.00: configured for UDMA/133 > ata2: EH complete > ata2.00: limiting speed to UDMA/100:PIO4 > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data > 131072 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ATA: abnormal status 0x7F on port 0x0001c807 > ATA: abnormal status 0x7F on port 0x0001c807 > ata2.00: configured for UDMA/100 > ata2: EH complete > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data > 131072 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ATA: abnormal status 0x7F on port 0x0001c807 > ATA: abnormal status 0x7F on port 0x0001c807 > ata2.00: configured for UDMA/100 > ata2: EH complete > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data > 131072 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ATA: abnormal status 0x7F on port 0x0001c807 > ATA: abnormal status 0x7F on port 0x0001c807 > ata2.00: configured for UDMA/100 > sd 3:0:0:0: SCSI error: return code = 0x0802 > sda: Current [descriptor]: sense key=0xb > ASC=0x0 ASCQ=0x0 > Descriptor sense data with sense descriptors (in hex): > 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 > 00 00 00 00 > end_request: I/O error, dev sda, sector 30132639 > ata2: EH complete > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data > 131072 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ATA: abnormal status 0x7F on port 0x0001c807 > ATA: abnormal status 0x7F on port 0x0001c807 > ata2.00: configured for UDMA/100 > ata2: EH complete > ata2.00: limiting speed to UDMA/33:PIO4 > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data > 131072 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ATA: abnormal status 0x7F on port 0x0001c807 > ATA: abnormal status 0x7F on port 0x0001c807 > ata2.00: configured for UDMA/33 > ata2: EH complete > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data > 131072 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ATA: abnormal status 0x7F on port 0x0001c807 > ATA:
Re: SATA problems
On Thu, 30 Aug 2007 09:24:18 + Nigel Kukard [EMAIL PROTECTED] wrote: Hrmmm, Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Unrelated to the other error, but I've been meaning to ask for a while.. If this is 'abnormal', why does every SATA box I've seen do it? *crickets* chirp, chirp. Should we check for this case explicitly, and not print this? After I get the above errors, my entire SATA bus crashes and I need to hard reset the box ... not sure we can just ignore the errors? Appears even with the patch provided a few months ago I'm getting freezes. Replaced the HDD all cables, same errors ... especially whilst doing heavy IO. Can anyone shed some light? I think I was told last week that copying the appropriate mailing list will at least prevent chirping, so let's try that. Original thread here: http://lkml.org/lkml/2007/6/14/154 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: limiting speed to UDMA/100:PIO4 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 sd 3:0:0:0: SCSI error: return code = 0x0802 sda: Current [descriptor]: sense key=0xb ASC=0x0 ASCQ=0x0 Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 00 00 00 end_request: I/O error, dev sda, sector 30132639 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: limiting speed to UDMA/33:PIO4 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/33 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/33 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Re: SATA problems
Hrmmm, >> > >> > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port >> > > > 0x0001c807 >> > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port >> > > > 0x0001c807 >> > >> > Unrelated to the other error, but I've been meaning to ask for a while.. >> > If this is 'abnormal', why does every SATA box I've seen do it? >> >> *crickets* >> >> Should we check for this case explicitly, and not print this? >> >> >> > After I get the above errors, my entire SATA bus crashes and I need to > hard reset the box ... not sure we can just ignore the errors? > > Appears even with the patch provided a few months ago I'm getting freezes. Replaced the HDD & all cables, same errors ... especially whilst doing heavy IO. Can anyone shed some light? ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: limiting speed to UDMA/100:PIO4 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 sd 3:0:0:0: SCSI error: return code = 0x0802 sda: Current [descriptor]: sense key=0xb ASC=0x0 ASCQ=0x0 Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 00 00 00 end_request: I/O error, dev sda, sector 30132639 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: limiting speed to UDMA/33:PIO4 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/33 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/33 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/33 ata2: EH complete signature.asc Description:
Re: SATA problems
Hrmmm, Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Unrelated to the other error, but I've been meaning to ask for a while.. If this is 'abnormal', why does every SATA box I've seen do it? *crickets* Should we check for this case explicitly, and not print this? After I get the above errors, my entire SATA bus crashes and I need to hard reset the box ... not sure we can just ignore the errors? Appears even with the patch provided a few months ago I'm getting freezes. Replaced the HDD all cables, same errors ... especially whilst doing heavy IO. Can anyone shed some light? ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/133 ata2: EH complete ata2.00: limiting speed to UDMA/100:PIO4 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:c9:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 sd 3:0:0:0: SCSI error: return code = 0x0802 sda: Current [descriptor]: sense key=0xb ASC=0x0 ASCQ=0x0 Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 00 00 00 end_request: I/O error, dev sda, sector 30132639 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: limiting speed to UDMA/33:PIO4 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/33 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/33 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:00:9f:ca:cb/00:00:00:00:00/e1 tag 0 cdb 0x0 data 131072 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ATA: abnormal status 0x7F on port 0x0001c807 ATA: abnormal status 0x7F on port 0x0001c807 ata2.00: configured for UDMA/33 ata2: EH complete signature.asc Description: OpenPGP digital signature
Re: SATA problems
> > > > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > > > > 0x0001c807 > > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > > > > 0x0001c807 > > > > Unrelated to the other error, but I've been meaning to ask for a while.. > > If this is 'abnormal', why does every SATA box I've seen do it? > > *crickets* > > Should we check for this case explicitly, and not print this? > > After I get the above errors, my entire SATA bus crashes and I need to hard reset the box ... not sure we can just ignore the errors? signature.asc Description: OpenPGP digital signature
Re: SATA problems
On Thu, Jun 14, 2007 at 02:28:54PM -0400, Dave Jones wrote: > On Thu, Jun 14, 2007 at 12:21:49PM -0400, Jeff Garzik wrote: > > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > > > 0x0001c807 > > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > > > 0x0001c807 > > Unrelated to the other error, but I've been meaning to ask for a while.. > If this is 'abnormal', why does every SATA box I've seen do it? *crickets* Should we check for this case explicitly, and not print this? Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Hi Jeff, Ok ... second part of my problem. Where should I look in trying to debug the below problem... Regards Nigel Jun 18 07:59:56 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 07:59:56 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 07:59:56 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 07:59:56 nigel-m2v kernel: ata2: soft resetting port Jun 18 07:59:56 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 07:59:56 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 07:59:56 nigel-m2v kernel: ata2.00: configured for UDMA/133 Jun 18 07:59:56 nigel-m2v kernel: ata2: EH complete Jun 18 08:00:26 nigel-m2v kernel: rtc: lost 7740 interrupts Jun 18 08:00:26 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:00:26 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:00:26 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:00:26 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:00:26 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:00:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:00:27 nigel-m2v kernel: ata2.00: configured for UDMA/133 Jun 18 08:00:27 nigel-m2v kernel: ata2: EH complete Jun 18 08:00:57 nigel-m2v kernel: rtc: lost 7741 interrupts Jun 18 08:00:57 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:00:57 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:00:57 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:00:57 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:00:57 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:00:57 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:00:57 nigel-m2v kernel: ata2.00: configured for UDMA/133 Jun 18 08:00:57 nigel-m2v kernel: ata2: EH complete Jun 18 08:01:27 nigel-m2v kernel: rtc: lost 7740 interrupts Jun 18 08:01:27 nigel-m2v kernel: ata2.00: limiting speed to UDMA/100:PIO4 Jun 18 08:01:27 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:01:27 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:01:27 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:01:27 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:01:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:01:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:01:27 nigel-m2v kernel: ata2.00: configured for UDMA/100 Jun 18 08:01:27 nigel-m2v kernel: ata2: EH complete Jun 18 08:01:57 nigel-m2v kernel: rtc: lost 7740 interrupts Jun 18 08:01:57 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:01:57 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:01:57 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:01:57 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:01:57 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:01:57 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:01:57 nigel-m2v kernel: ata2.00: configured for UDMA/100 Jun 18 08:01:57 nigel-m2v kernel: ata2: EH complete Jun 18 08:02:27 nigel-m2v kernel: rtc: lost 7741 interrupts Jun 18 08:02:27 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:02:27 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:02:27 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:02:27 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:02:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:02:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:02:28 nigel-m2v kernel: ata2.00: configured for UDMA/100 Jun 18 08:02:28 nigel-m2v kernel: sd 3:0:0:0: SCSI error: return code = 0x0802 Jun 18 08:02:28 nigel-m2v kernel: sda: Current [descriptor]: sense key=0xb Jun 18 08:02:28 nigel-m2v kernel: ASC=0x0 ASCQ=0x0 Jun 18 08:02:28 nigel-m2v kernel: Descriptor sense data with sense descriptors (in hex): Jun 18 08:02:28 nigel-m2v kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Jun 18 08:02:28 nigel-m2v kernel: 00 00 00 00 Jun 18 08:02:28 nigel-m2v kernel: end_request: I/O error, dev sda, sector 141077439 Jun 18 08:02:28 nigel-m2v kernel: Buffer I/O error on device sda1, logical block 17634672 Jun 18 08:02:28 nigel-m2v kernel:
Re: SATA problems
Hi Jeff, Ok ... second part of my problem. Where should I look in trying to debug the below problem... Regards Nigel Jun 18 07:59:56 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 07:59:56 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 07:59:56 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 07:59:56 nigel-m2v kernel: ata2: soft resetting port Jun 18 07:59:56 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 07:59:56 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 07:59:56 nigel-m2v kernel: ata2.00: configured for UDMA/133 Jun 18 07:59:56 nigel-m2v kernel: ata2: EH complete Jun 18 08:00:26 nigel-m2v kernel: rtc: lost 7740 interrupts Jun 18 08:00:26 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:00:26 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:00:26 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:00:26 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:00:26 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:00:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:00:27 nigel-m2v kernel: ata2.00: configured for UDMA/133 Jun 18 08:00:27 nigel-m2v kernel: ata2: EH complete Jun 18 08:00:57 nigel-m2v kernel: rtc: lost 7741 interrupts Jun 18 08:00:57 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:00:57 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:00:57 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:00:57 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:00:57 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:00:57 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:00:57 nigel-m2v kernel: ata2.00: configured for UDMA/133 Jun 18 08:00:57 nigel-m2v kernel: ata2: EH complete Jun 18 08:01:27 nigel-m2v kernel: rtc: lost 7740 interrupts Jun 18 08:01:27 nigel-m2v kernel: ata2.00: limiting speed to UDMA/100:PIO4 Jun 18 08:01:27 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:01:27 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:01:27 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:01:27 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:01:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:01:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:01:27 nigel-m2v kernel: ata2.00: configured for UDMA/100 Jun 18 08:01:27 nigel-m2v kernel: ata2: EH complete Jun 18 08:01:57 nigel-m2v kernel: rtc: lost 7740 interrupts Jun 18 08:01:57 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:01:57 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:01:57 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:01:57 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:01:57 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:01:57 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:01:57 nigel-m2v kernel: ata2.00: configured for UDMA/100 Jun 18 08:01:57 nigel-m2v kernel: ata2: EH complete Jun 18 08:02:27 nigel-m2v kernel: rtc: lost 7741 interrupts Jun 18 08:02:27 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 18 08:02:27 nigel-m2v kernel: ata2.00: cmd ca/00:08:bf:ab:68/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 out Jun 18 08:02:27 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 18 08:02:27 nigel-m2v kernel: ata2: soft resetting port Jun 18 08:02:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:02:27 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 18 08:02:28 nigel-m2v kernel: ata2.00: configured for UDMA/100 Jun 18 08:02:28 nigel-m2v kernel: sd 3:0:0:0: SCSI error: return code = 0x0802 Jun 18 08:02:28 nigel-m2v kernel: sda: Current [descriptor]: sense key=0xb Jun 18 08:02:28 nigel-m2v kernel: ASC=0x0 ASCQ=0x0 Jun 18 08:02:28 nigel-m2v kernel: Descriptor sense data with sense descriptors (in hex): Jun 18 08:02:28 nigel-m2v kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Jun 18 08:02:28 nigel-m2v kernel: 00 00 00 00 Jun 18 08:02:28 nigel-m2v kernel: end_request: I/O error, dev sda, sector 141077439 Jun 18 08:02:28 nigel-m2v kernel: Buffer I/O error on device sda1, logical block 17634672 Jun 18 08:02:28 nigel-m2v kernel:
Re: SATA problems
On Thu, Jun 14, 2007 at 02:28:54PM -0400, Dave Jones wrote: On Thu, Jun 14, 2007 at 12:21:49PM -0400, Jeff Garzik wrote: Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Unrelated to the other error, but I've been meaning to ask for a while.. If this is 'abnormal', why does every SATA box I've seen do it? *crickets* Should we check for this case explicitly, and not print this? Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Unrelated to the other error, but I've been meaning to ask for a while.. If this is 'abnormal', why does every SATA box I've seen do it? *crickets* Should we check for this case explicitly, and not print this? After I get the above errors, my entire SATA bus crashes and I need to hard reset the box ... not sure we can just ignore the errors? signature.asc Description: OpenPGP digital signature
Re: SATA problems
On Thu, Jun 14, 2007 at 12:21:49PM -0400, Jeff Garzik wrote: > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > > 0x0001c807 > > Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port > > 0x0001c807 Unrelated to the other error, but I've been meaning to ask for a while.. If this is 'abnormal', why does every SATA box I've seen do it? Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Nigel Kukard wrote: I'm stumped trying to track down the below intermittent problem. I've confirmed this problem on 2.6.19, 2.6.20 and 2.6.21. Jun 14 07:55:52 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 14 07:55:52 nigel-m2v kernel: ata2.00: cmd ca/00:18:87:e7:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 12288 out Jun 14 07:55:52 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 14 07:55:52 nigel-m2v kernel: ata2: soft resetting port Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:56:22 nigel-m2v kernel: ata2.00: qc timeout (cmd 0xef) Jun 14 07:56:22 nigel-m2v kernel: ata2.00: failed to set xfermode (err_mask=0x4) Try 2.6.22-rc4-gitX... Is there a patch in particular I can maybe apply? I see you made a couple of commits ... my problem is this is also happening on one of my production boxes which has a few other patches applied, I'm a bit scared of conflicts ... I don't really want to break anything by upgrading the kernel. The two most relevant git commits: commit 51b94d2a5a90d4800e74d7348bcde098a28f4fb3 Author: Tejun Heo <[EMAIL PROTECTED]> Date: Fri Jun 8 13:46:55 2007 -0700 sata_promise: use TF interface for polling NODATA commands commit 464cf177df7727efcc5506322fc5d0c8b896f545 Author: Tejun Heo <[EMAIL PROTECTED]> Date: Sun May 27 15:10:40 2007 +0200 libata: always use polling SETXFER If you have a git tree local to you, "git-diff-tree -p $COMMIT" will extract a patch, otherwise click "raw" after surfing to http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=$COMMIT Regards, Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
>> I'm stumped trying to track down the below intermittent problem. >> >> I've confirmed this problem on 2.6.19, 2.6.20 and 2.6.21. >> >> Any help greatly appreciated! >> >> Regards >> Nigel >> >> >> Jun 14 07:55:52 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 >> SErr 0x0 action 0x2 frozen >> Jun 14 07:55:52 nigel-m2v kernel: ata2.00: cmd >> ca/00:18:87:e7:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 12288 out >> Jun 14 07:55:52 nigel-m2v kernel: res >> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) >> Jun 14 07:55:52 nigel-m2v kernel: ata2: soft resetting port >> Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port >> 0x0001c807 >> Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port >> 0x0001c807 >> Jun 14 07:56:22 nigel-m2v kernel: ata2.00: qc timeout (cmd 0xef) >> Jun 14 07:56:22 nigel-m2v kernel: ata2.00: failed to set xfermode >> (err_mask=0x4) > > Try 2.6.22-rc4-gitX... > > Jeff Hi Jeff, Is there a patch in particular I can maybe apply? I see you made a couple of commits ... my problem is this is also happening on one of my production boxes which has a few other patches applied, I'm a bit scared of conflicts ... I don't really want to break anything by upgrading the kernel. Kind Regards Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Nigel Kukard wrote: I'm stumped trying to track down the below intermittent problem. I've confirmed this problem on 2.6.19, 2.6.20 and 2.6.21. Any help greatly appreciated! Regards Nigel Jun 14 07:55:52 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 14 07:55:52 nigel-m2v kernel: ata2.00: cmd ca/00:18:87:e7:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 12288 out Jun 14 07:55:52 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 14 07:55:52 nigel-m2v kernel: ata2: soft resetting port Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:56:22 nigel-m2v kernel: ata2.00: qc timeout (cmd 0xef) Jun 14 07:56:22 nigel-m2v kernel: ata2.00: failed to set xfermode (err_mask=0x4) Try 2.6.22-rc4-gitX... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Nigel Kukard wrote: I'm stumped trying to track down the below intermittent problem. I've confirmed this problem on 2.6.19, 2.6.20 and 2.6.21. Any help greatly appreciated! Regards Nigel Jun 14 07:55:52 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 14 07:55:52 nigel-m2v kernel: ata2.00: cmd ca/00:18:87:e7:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 12288 out Jun 14 07:55:52 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 14 07:55:52 nigel-m2v kernel: ata2: soft resetting port Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:56:22 nigel-m2v kernel: ata2.00: qc timeout (cmd 0xef) Jun 14 07:56:22 nigel-m2v kernel: ata2.00: failed to set xfermode (err_mask=0x4) Try 2.6.22-rc4-gitX... Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
I'm stumped trying to track down the below intermittent problem. I've confirmed this problem on 2.6.19, 2.6.20 and 2.6.21. Any help greatly appreciated! Regards Nigel Jun 14 07:55:52 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 14 07:55:52 nigel-m2v kernel: ata2.00: cmd ca/00:18:87:e7:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 12288 out Jun 14 07:55:52 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 14 07:55:52 nigel-m2v kernel: ata2: soft resetting port Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:56:22 nigel-m2v kernel: ata2.00: qc timeout (cmd 0xef) Jun 14 07:56:22 nigel-m2v kernel: ata2.00: failed to set xfermode (err_mask=0x4) Try 2.6.22-rc4-gitX... Jeff Hi Jeff, Is there a patch in particular I can maybe apply? I see you made a couple of commits ... my problem is this is also happening on one of my production boxes which has a few other patches applied, I'm a bit scared of conflicts ... I don't really want to break anything by upgrading the kernel. Kind Regards Nigel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Nigel Kukard wrote: I'm stumped trying to track down the below intermittent problem. I've confirmed this problem on 2.6.19, 2.6.20 and 2.6.21. Jun 14 07:55:52 nigel-m2v kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 14 07:55:52 nigel-m2v kernel: ata2.00: cmd ca/00:18:87:e7:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 12288 out Jun 14 07:55:52 nigel-m2v kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 14 07:55:52 nigel-m2v kernel: ata2: soft resetting port Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:56:22 nigel-m2v kernel: ata2.00: qc timeout (cmd 0xef) Jun 14 07:56:22 nigel-m2v kernel: ata2.00: failed to set xfermode (err_mask=0x4) Try 2.6.22-rc4-gitX... Is there a patch in particular I can maybe apply? I see you made a couple of commits ... my problem is this is also happening on one of my production boxes which has a few other patches applied, I'm a bit scared of conflicts ... I don't really want to break anything by upgrading the kernel. The two most relevant git commits: commit 51b94d2a5a90d4800e74d7348bcde098a28f4fb3 Author: Tejun Heo [EMAIL PROTECTED] Date: Fri Jun 8 13:46:55 2007 -0700 sata_promise: use TF interface for polling NODATA commands commit 464cf177df7727efcc5506322fc5d0c8b896f545 Author: Tejun Heo [EMAIL PROTECTED] Date: Sun May 27 15:10:40 2007 +0200 libata: always use polling SETXFER If you have a git tree local to you, git-diff-tree -p $COMMIT will extract a patch, otherwise click raw after surfing to http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=$COMMIT Regards, Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
On Thu, Jun 14, 2007 at 12:21:49PM -0400, Jeff Garzik wrote: Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Jun 14 07:55:52 nigel-m2v kernel: ATA: abnormal status 0x7F on port 0x0001c807 Unrelated to the other error, but I've been meaning to ask for a while.. If this is 'abnormal', why does every SATA box I've seen do it? Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
Christian wrote: I'm seeing the same here since a few days. Before it worked great (even with NCQ). I've been getting those messages since 2.6.21-rc3-mm1 and with the latest Ubuntu feisty kernel (2.6.20-11-generic #2 SMP Thu Mar 15 03:43:56 UTC 2007 x86_64 GNU/Linux) System is Athlon64 X2, Nforce4, 3x Samsung SATA II NCQ discs. Can you try 2.6.21-rc4? There was a change that went in between rc3 and rc4 to revert a previous change which seemed to be problematic. As far as 2.6.20, I'm somewhat tempted to submit this patch to -stable: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5e5c74a5e11d1e2a99d03132cc6c4455016db6c2 -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
I'm seeing the same here since a few days. Before it worked great (even with NCQ). I've been getting those messages since 2.6.21-rc3-mm1 and with the latest Ubuntu feisty kernel (2.6.20-11-generic #2 SMP Thu Mar 15 03:43:56 UTC 2007 x86_64 GNU/Linux) System is Athlon64 X2, Nforce4, 3x Samsung SATA II NCQ discs. [10802.844891] ata1: soft resetting port [10802.922845] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [10802.966231] ata1.00: Host Protected Area detected: [10802.966232] current size: 781422768 sectors (400 GB) [10802.966233] native size: -1349283664 sectors (18446743382 GB) [10802.966237] ata1.00: configured for UDMA/133 [10802.966265] ata1: EH complete [10817.958196] ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 [10817.958201] ata1: CPB 0: ctl_flags 0x1f, resp_flags 0x0 [10817.958203] ata1: CPB 1: ctl_flags 0x1f, resp_flags 0x0 [10817.958205] ata1: CPB 2: ctl_flags 0x1f, resp_flags 0x0 [10817.958206] ata1: CPB 3: ctl_flags 0x1f, resp_flags 0x0 [10817.958208] ata1: CPB 4: ctl_flags 0x1f, resp_flags 0x0 [10817.958210] ata1: CPB 5: ctl_flags 0x1f, resp_flags 0x0 [10817.958211] ata1: CPB 6: ctl_flags 0x1f, resp_flags 0x0 [10817.958213] ata1: CPB 7: ctl_flags 0x1f, resp_flags 0x0 [10817.958215] ata1: CPB 8: ctl_flags 0x1f, resp_flags 0x0 [10817.958216] ata1: CPB 9: ctl_flags 0x1f, resp_flags 0x0 [10817.958218] ata1: CPB 10: ctl_flags 0x1f, resp_flags 0x0 [10817.958220] ata1: CPB 11: ctl_flags 0x1f, resp_flags 0x0 [10817.958222] ata1: CPB 12: ctl_flags 0x1f, resp_flags 0x0 [10817.958224] ata1: CPB 13: ctl_flags 0x1f, resp_flags 0x0 [10817.958225] ata1: CPB 14: ctl_flags 0x1f, resp_flags 0x0 [10817.958227] ata1: CPB 15: ctl_flags 0x1f, resp_flags 0x0 [10817.958229] ata1: CPB 16: ctl_flags 0x1f, resp_flags 0x0 [10817.958231] ata1: CPB 17: ctl_flags 0x1f, resp_flags 0x0 [10817.958233] ata1: CPB 18: ctl_flags 0x1f, resp_flags 0x0 [10817.958235] ata1: CPB 19: ctl_flags 0x1f, resp_flags 0x0 [10817.958236] ata1: CPB 20: ctl_flags 0x1f, resp_flags 0x0 [10817.958238] ata1: CPB 21: ctl_flags 0x1f, resp_flags 0x0 [10817.958240] ata1: CPB 22: ctl_flags 0x1f, resp_flags 0x0 [10817.958242] ata1: CPB 23: ctl_flags 0x1f, resp_flags 0x0 [10817.958244] ata1: CPB 24: ctl_flags 0x1f, resp_flags 0x0 [10817.958245] ata1: CPB 25: ctl_flags 0x1f, resp_flags 0x0 [10817.958247] ata1: CPB 26: ctl_flags 0x1f, resp_flags 0x0 [10817.958249] ata1: CPB 27: ctl_flags 0x1f, resp_flags 0x0 [10817.958250] ata1: CPB 28: ctl_flags 0x1f, resp_flags 0x0 [10817.958252] ata1: CPB 29: ctl_flags 0x1f, resp_flags 0x0 [10817.958254] ata1: CPB 30: ctl_flags 0x1f, resp_flags 0x0 [10817.958256] ata1: Resetting port [10817.958262] ata1.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x2 frozen [10817.958267] ata1.00: cmd 61/00:00:c7:6b:46/02:00:02:00:00/40 tag 0 cdb 0x0 data 262144 out [10817.958268] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958272] ata1.00: cmd 61/00:08:c7:6d:46/02:00:02:00:00/40 tag 1 cdb 0x0 data 262144 out [10817.958274] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958278] ata1.00: cmd 61/00:10:c7:6f:46/04:00:02:00:00/40 tag 2 cdb 0x0 data 524288 out [10817.958279] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958283] ata1.00: cmd 61/00:18:c7:73:46/02:00:02:00:00/40 tag 3 cdb 0x0 data 262144 out [10817.958285] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958289] ata1.00: cmd 61/00:20:c7:75:46/04:00:02:00:00/40 tag 4 cdb 0x0 data 524288 out [10817.958290] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958294] ata1.00: cmd 61/00:28:c7:79:46/02:00:02:00:00/40 tag 5 cdb 0x0 data 262144 out [10817.958296] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958300] ata1.00: cmd 61/00:30:c7:7b:46/02:00:02:00:00/40 tag 6 cdb 0x0 data 262144 out [10817.958301] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958305] ata1.00: cmd 61/08:38:c7:7f:46/00:00:02:00:00/40 tag 7 cdb 0x0 data 4096 out [10817.958307] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958311] ata1.00: cmd 61/00:40:c7:7d:46/02:00:02:00:00/40 tag 8 cdb 0x0 data 262144 out [10817.958312] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958316] ata1.00: cmd 61/00:48:cf:7f:46/02:00:02:00:00/40 tag 9 cdb 0x0 data 262144 out [10817.958317] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958322] ata1.00: cmd 61/00:50:cf:81:46/02:00:02:00:00/40 tag 10 cdb 0x0 data 262144 out [10817.958323] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958327] ata1.00: cmd 61/00:58:cf:83:46/02:00:02:00:00/40 tag 11 cdb 0x0 data 262144 out [10817.958328] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958333] ata1.00: cmd 61/00:60:cf:85:46/02:00:02:00:00/40 tag 12 cdb
Re: SATA problems in 2.6.20.3
I'm seeing the same here since a few days. Before it worked great (even with NCQ). I've been getting those messages since 2.6.21-rc3-mm1 and with the latest Ubuntu feisty kernel (2.6.20-11-generic #2 SMP Thu Mar 15 03:43:56 UTC 2007 x86_64 GNU/Linux) System is Athlon64 X2, Nforce4, 3x Samsung SATA II NCQ discs. [10802.844891] ata1: soft resetting port [10802.922845] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [10802.966231] ata1.00: Host Protected Area detected: [10802.966232] current size: 781422768 sectors (400 GB) [10802.966233] native size: -1349283664 sectors (18446743382 GB) [10802.966237] ata1.00: configured for UDMA/133 [10802.966265] ata1: EH complete [10817.958196] ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 [10817.958201] ata1: CPB 0: ctl_flags 0x1f, resp_flags 0x0 [10817.958203] ata1: CPB 1: ctl_flags 0x1f, resp_flags 0x0 [10817.958205] ata1: CPB 2: ctl_flags 0x1f, resp_flags 0x0 [10817.958206] ata1: CPB 3: ctl_flags 0x1f, resp_flags 0x0 [10817.958208] ata1: CPB 4: ctl_flags 0x1f, resp_flags 0x0 [10817.958210] ata1: CPB 5: ctl_flags 0x1f, resp_flags 0x0 [10817.958211] ata1: CPB 6: ctl_flags 0x1f, resp_flags 0x0 [10817.958213] ata1: CPB 7: ctl_flags 0x1f, resp_flags 0x0 [10817.958215] ata1: CPB 8: ctl_flags 0x1f, resp_flags 0x0 [10817.958216] ata1: CPB 9: ctl_flags 0x1f, resp_flags 0x0 [10817.958218] ata1: CPB 10: ctl_flags 0x1f, resp_flags 0x0 [10817.958220] ata1: CPB 11: ctl_flags 0x1f, resp_flags 0x0 [10817.958222] ata1: CPB 12: ctl_flags 0x1f, resp_flags 0x0 [10817.958224] ata1: CPB 13: ctl_flags 0x1f, resp_flags 0x0 [10817.958225] ata1: CPB 14: ctl_flags 0x1f, resp_flags 0x0 [10817.958227] ata1: CPB 15: ctl_flags 0x1f, resp_flags 0x0 [10817.958229] ata1: CPB 16: ctl_flags 0x1f, resp_flags 0x0 [10817.958231] ata1: CPB 17: ctl_flags 0x1f, resp_flags 0x0 [10817.958233] ata1: CPB 18: ctl_flags 0x1f, resp_flags 0x0 [10817.958235] ata1: CPB 19: ctl_flags 0x1f, resp_flags 0x0 [10817.958236] ata1: CPB 20: ctl_flags 0x1f, resp_flags 0x0 [10817.958238] ata1: CPB 21: ctl_flags 0x1f, resp_flags 0x0 [10817.958240] ata1: CPB 22: ctl_flags 0x1f, resp_flags 0x0 [10817.958242] ata1: CPB 23: ctl_flags 0x1f, resp_flags 0x0 [10817.958244] ata1: CPB 24: ctl_flags 0x1f, resp_flags 0x0 [10817.958245] ata1: CPB 25: ctl_flags 0x1f, resp_flags 0x0 [10817.958247] ata1: CPB 26: ctl_flags 0x1f, resp_flags 0x0 [10817.958249] ata1: CPB 27: ctl_flags 0x1f, resp_flags 0x0 [10817.958250] ata1: CPB 28: ctl_flags 0x1f, resp_flags 0x0 [10817.958252] ata1: CPB 29: ctl_flags 0x1f, resp_flags 0x0 [10817.958254] ata1: CPB 30: ctl_flags 0x1f, resp_flags 0x0 [10817.958256] ata1: Resetting port [10817.958262] ata1.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x2 frozen [10817.958267] ata1.00: cmd 61/00:00:c7:6b:46/02:00:02:00:00/40 tag 0 cdb 0x0 data 262144 out [10817.958268] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958272] ata1.00: cmd 61/00:08:c7:6d:46/02:00:02:00:00/40 tag 1 cdb 0x0 data 262144 out [10817.958274] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958278] ata1.00: cmd 61/00:10:c7:6f:46/04:00:02:00:00/40 tag 2 cdb 0x0 data 524288 out [10817.958279] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958283] ata1.00: cmd 61/00:18:c7:73:46/02:00:02:00:00/40 tag 3 cdb 0x0 data 262144 out [10817.958285] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958289] ata1.00: cmd 61/00:20:c7:75:46/04:00:02:00:00/40 tag 4 cdb 0x0 data 524288 out [10817.958290] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958294] ata1.00: cmd 61/00:28:c7:79:46/02:00:02:00:00/40 tag 5 cdb 0x0 data 262144 out [10817.958296] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958300] ata1.00: cmd 61/00:30:c7:7b:46/02:00:02:00:00/40 tag 6 cdb 0x0 data 262144 out [10817.958301] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958305] ata1.00: cmd 61/08:38:c7:7f:46/00:00:02:00:00/40 tag 7 cdb 0x0 data 4096 out [10817.958307] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958311] ata1.00: cmd 61/00:40:c7:7d:46/02:00:02:00:00/40 tag 8 cdb 0x0 data 262144 out [10817.958312] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958316] ata1.00: cmd 61/00:48:cf:7f:46/02:00:02:00:00/40 tag 9 cdb 0x0 data 262144 out [10817.958317] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958322] ata1.00: cmd 61/00:50:cf:81:46/02:00:02:00:00/40 tag 10 cdb 0x0 data 262144 out [10817.958323] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958327] ata1.00: cmd 61/00:58:cf:83:46/02:00:02:00:00/40 tag 11 cdb 0x0 data 262144 out [10817.958328] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [10817.958333] ata1.00: cmd 61/00:60:cf:85:46/02:00:02:00:00/40 tag 12 cdb
Re: SATA problems in 2.6.20.3
Christian wrote: I'm seeing the same here since a few days. Before it worked great (even with NCQ). I've been getting those messages since 2.6.21-rc3-mm1 and with the latest Ubuntu feisty kernel (2.6.20-11-generic #2 SMP Thu Mar 15 03:43:56 UTC 2007 x86_64 GNU/Linux) System is Athlon64 X2, Nforce4, 3x Samsung SATA II NCQ discs. Can you try 2.6.21-rc4? There was a change that went in between rc3 and rc4 to revert a previous change which seemed to be problematic. As far as 2.6.20, I'm somewhat tempted to submit this patch to -stable: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5e5c74a5e11d1e2a99d03132cc6c4455016db6c2 -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
On Fri, 16 Mar 2007 17:44:25 -0600 Robert Hancock <[EMAIL PROTECTED]> wrote: > Charles Shannon Hendrix wrote: > > I normally run a modified 2.6.19 kernel and it works great. > > > > I recently tried 2.6.20 and had severe SATA problems with it. > > > > Yesterday I tried 2.6.20.3, and the problems are still there. > > Can you try 2.6.21-rc and see if the problem is fixed in those kernels? OK. sata_nv.adma=0 let's me run 2.6.20.3 for now. I'll test 2.6.21-rc tomorrow some time. -- shannon | Work for something because it is good, not just because | it stands a chance to succeed. |-- Vaclav Havel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
On Friday 16 March 2007 23:44, you wrote: > Charles Shannon Hendrix wrote: > > I normally run a modified 2.6.19 kernel and it works great. > > > > I recently tried 2.6.20 and had severe SATA problems with it. > > > > Yesterday I tried 2.6.20.3, and the problems are still there. > > Can you try 2.6.21-rc and see if the problem is fixed in those kernels? -rc4 specifically, it's the first one that's worked for me (possibly related). (BTW Robert, the sata_nv shadow registers patch has been fine here with a patched -rc3 for just over a week now.) -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
Charles Shannon Hendrix wrote: I normally run a modified 2.6.19 kernel and it works great. I recently tried 2.6.20 and had severe SATA problems with it. Yesterday I tried 2.6.20.3, and the problems are still there. Can you try 2.6.21-rc and see if the problem is fixed in those kernels? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
On Fri, 16 Mar 2007 11:58:21 -0400 Jeff Garzik <[EMAIL PROTECTED]> wrote: > Charles Shannon Hendrix wrote: > > I normally run a modified 2.6.19 kernel and it works great. > > > > I recently tried 2.6.20 and had severe SATA problems with it. > > > > Yesterday I tried 2.6.20.3, and the problems are still there. > > Setting the module parameter 'adma' to zero fixes this, yes? Seems to. NCQ would be nice of course, but this is usable. -- The strength of the Constitution lies entirely in the determination of each citizen to defend it. Only if every single citizen feels duty bound to do his share in this defense are the constitutional rights secure. -- Albert Einstein - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
Charles Shannon Hendrix wrote: I normally run a modified 2.6.19 kernel and it works great. I recently tried 2.6.20 and had severe SATA problems with it. Yesterday I tried 2.6.20.3, and the problems are still there. Setting the module parameter 'adma' to zero fixes this, yes? Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SATA problems in 2.6.20.3
I normally run a modified 2.6.19 kernel and it works great. I recently tried 2.6.20 and had severe SATA problems with it. Yesterday I tried 2.6.20.3, and the problems are still there. Relevant /var/log/messages entries: Mar 14 20:45:11 daydream kernel: ata3: EH in ADMA mode, notifier 0x0 notifier_er ror 0x0 gen_ctl 0x1501000 status 0x400 Mar 14 20:45:11 daydream kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x0 Mar 14 20:45:11 daydream kernel: ata3: CPB 1: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 2: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 3: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 4: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 5: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 6: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 7: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 8: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 9: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 10: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 11: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 12: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 13: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 14: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 15: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 16: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 17: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 18: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 19: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 20: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 21: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 22: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 23: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 24: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 25: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 26: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: CPB 27: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: CPB 28: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: CPB 29: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: CPB 30: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: Resetting port Mar 14 20:45:13 daydream kernel: ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen Mar 14 20:45:13 daydream kernel: ata3.00: cmd 61/40:00:ed:39:b1/00:00:04:00:00/4 0 tag 0 cdb 0x0 data 32768 out Mar 14 20:45:13 daydream kernel: res 40/00:00:00:00:00/00:00:00:00:00/0 0 Emask 0x4 (timeout) Mar 14 20:45:13 daydream kernel: ata3: soft resetting port Mar 14 20:45:13 daydream kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SContr ol 300) Mar 14 20:45:13 daydream kernel: ata3.00: configured for UDMA/133 Mar 14 20:45:13 daydream kernel: ata3: EH complete Mar 14 20:45:13 daydream kernel: SCSI device sdc: 156301488 512-byte hdwr sector s (80026 MB) Mar 14 20:45:13 daydream kernel: sdc: Write Protect is off Mar 14 20:45:13 daydream kernel: sdc: Mode Sense: 00 3a 00 00 Mar 14 20:45:14 daydream kernel: SCSI device sdc: write cache: enabled, read cac he: enabled, doesn't support DPO or FUA Mar 14 20:49:12 daydream kernel: ata3: EH in ADMA mode, notifier 0x0 notifier_er ror 0x0 gen_ctl 0x1501000 status 0x400 Mar 14 20:49:12 daydream kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x0 Mar 14 20:49:12 daydream kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x0 Mar 14 20:49:12 daydream kernel: ata3: CPB 1: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 2: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 3: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 4: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 5: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 6: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 7: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 8: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 9: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 10: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 11: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:18 daydream kernel: ata3: CPB 12: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:23 daydream kernel: ata3: CPB 13: ctl_flags 0x1f, resp_flags
SATA problems in 2.6.20.3
I normally run a modified 2.6.19 kernel and it works great. I recently tried 2.6.20 and had severe SATA problems with it. Yesterday I tried 2.6.20.3, and the problems are still there. Relevant /var/log/messages entries: Mar 14 20:45:11 daydream kernel: ata3: EH in ADMA mode, notifier 0x0 notifier_er ror 0x0 gen_ctl 0x1501000 status 0x400 Mar 14 20:45:11 daydream kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x0 Mar 14 20:45:11 daydream kernel: ata3: CPB 1: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 2: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 3: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 4: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 5: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 6: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 7: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 8: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 9: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 10: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 11: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:11 daydream kernel: ata3: CPB 12: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 13: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 14: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 15: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 16: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 17: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 18: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 19: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 20: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 21: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 22: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 23: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 24: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 25: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:12 daydream kernel: ata3: CPB 26: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: CPB 27: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: CPB 28: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: CPB 29: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: CPB 30: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:45:13 daydream kernel: ata3: Resetting port Mar 14 20:45:13 daydream kernel: ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen Mar 14 20:45:13 daydream kernel: ata3.00: cmd 61/40:00:ed:39:b1/00:00:04:00:00/4 0 tag 0 cdb 0x0 data 32768 out Mar 14 20:45:13 daydream kernel: res 40/00:00:00:00:00/00:00:00:00:00/0 0 Emask 0x4 (timeout) Mar 14 20:45:13 daydream kernel: ata3: soft resetting port Mar 14 20:45:13 daydream kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SContr ol 300) Mar 14 20:45:13 daydream kernel: ata3.00: configured for UDMA/133 Mar 14 20:45:13 daydream kernel: ata3: EH complete Mar 14 20:45:13 daydream kernel: SCSI device sdc: 156301488 512-byte hdwr sector s (80026 MB) Mar 14 20:45:13 daydream kernel: sdc: Write Protect is off Mar 14 20:45:13 daydream kernel: sdc: Mode Sense: 00 3a 00 00 Mar 14 20:45:14 daydream kernel: SCSI device sdc: write cache: enabled, read cac he: enabled, doesn't support DPO or FUA Mar 14 20:49:12 daydream kernel: ata3: EH in ADMA mode, notifier 0x0 notifier_er ror 0x0 gen_ctl 0x1501000 status 0x400 Mar 14 20:49:12 daydream kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x0 Mar 14 20:49:12 daydream kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x0 Mar 14 20:49:12 daydream kernel: ata3: CPB 1: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 2: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 3: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 4: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 5: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 6: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 7: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 8: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 9: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 10: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:12 daydream kernel: ata3: CPB 11: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:18 daydream kernel: ata3: CPB 12: ctl_flags 0x1f, resp_flags 0x1 Mar 14 20:49:23 daydream kernel: ata3: CPB 13: ctl_flags 0x1f, resp_flags
Re: SATA problems in 2.6.20.3
Charles Shannon Hendrix wrote: I normally run a modified 2.6.19 kernel and it works great. I recently tried 2.6.20 and had severe SATA problems with it. Yesterday I tried 2.6.20.3, and the problems are still there. Setting the module parameter 'adma' to zero fixes this, yes? Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
On Fri, 16 Mar 2007 11:58:21 -0400 Jeff Garzik [EMAIL PROTECTED] wrote: Charles Shannon Hendrix wrote: I normally run a modified 2.6.19 kernel and it works great. I recently tried 2.6.20 and had severe SATA problems with it. Yesterday I tried 2.6.20.3, and the problems are still there. Setting the module parameter 'adma' to zero fixes this, yes? Seems to. NCQ would be nice of course, but this is usable. -- The strength of the Constitution lies entirely in the determination of each citizen to defend it. Only if every single citizen feels duty bound to do his share in this defense are the constitutional rights secure. -- Albert Einstein - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
Charles Shannon Hendrix wrote: I normally run a modified 2.6.19 kernel and it works great. I recently tried 2.6.20 and had severe SATA problems with it. Yesterday I tried 2.6.20.3, and the problems are still there. Can you try 2.6.21-rc and see if the problem is fixed in those kernels? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
On Friday 16 March 2007 23:44, you wrote: Charles Shannon Hendrix wrote: I normally run a modified 2.6.19 kernel and it works great. I recently tried 2.6.20 and had severe SATA problems with it. Yesterday I tried 2.6.20.3, and the problems are still there. Can you try 2.6.21-rc and see if the problem is fixed in those kernels? -rc4 specifically, it's the first one that's worked for me (possibly related). (BTW Robert, the sata_nv shadow registers patch has been fine here with a patched -rc3 for just over a week now.) -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems in 2.6.20.3
On Fri, 16 Mar 2007 17:44:25 -0600 Robert Hancock [EMAIL PROTECTED] wrote: Charles Shannon Hendrix wrote: I normally run a modified 2.6.19 kernel and it works great. I recently tried 2.6.20 and had severe SATA problems with it. Yesterday I tried 2.6.20.3, and the problems are still there. Can you try 2.6.21-rc and see if the problem is fixed in those kernels? OK. sata_nv.adma=0 let's me run 2.6.20.3 for now. I'll test 2.6.21-rcwhatever tomorrow some time. -- shannon | Work for something because it is good, not just because | it stands a chance to succeed. |-- Vaclav Havel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: Pablo Sebastian Greco wrote: Tejun Heo wrote: * Pablo, the bug you saw was bad interaction between blacklisted NCQ device and dynamic queue depth adjustment. Patches are submitted to fix the problem. Just drop the blacklist patch. Your drives should work fine in NCQ mode. My gut feeling is that your problem is power related from the beginning. I had the same problems with a new Power Supply, Now everything is ok with the old Power Supply and the new drives. So, it was bad drives? Are you using the same model or different ones? NCQ works okay now? All I can say is that now is working, other things changed with the new drives: 1.5Gbps instead of 3Gbps, also new drives don't use NCQ (I'm reattaching a full dmesg). Also I've found this firmware upgrade (http://www.samsung.com/Products/HardDiskDrive/support/faqs/faqs_20060414_246673.htm) for the old drives, but couldn't confirm if it should be applied because the server is in Brazil and I live in Argentina. Won't be there until April to test. Thanks. Pablo. Linux version 2.6.19-1.2895.fc6 ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)) #1 SMP Wed Jan 10 18:50:56 EST 2007 Command line: ro root=LABEL=/ BIOS-provided physical RAM map: BIOS-e820: - 0009ec00 (usable) BIOS-e820: 0009ec00 - 0010 (reserved) BIOS-e820: 0010 - df938000 (usable) BIOS-e820: df938000 - df9d2000 (ACPI NVS) BIOS-e820: df9d2000 - dfa42000 (usable) BIOS-e820: dfa42000 - dfa9a000 (reserved) BIOS-e820: dfa9a000 - dfab8000 (usable) BIOS-e820: dfab8000 - dfb1a000 (ACPI NVS) BIOS-e820: dfb1a000 - dfb2c000 (usable) BIOS-e820: dfb2c000 - dfb3a000 (ACPI data) BIOS-e820: dfb3a000 - dfc0 (usable) BIOS-e820: ffc0 - ffc0c000 (reserved) BIOS-e820: 0001 - 00012000 (usable) Entering add_active_range(0, 0, 158) 0 entries of 3200 used Entering add_active_range(0, 256, 915768) 1 entries of 3200 used Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used end_pfn_map = 1179648 DMI 2.4 present. ACPI: RSDP (v002 INTEL ) @ 0x000f0350 ACPI: XSDT (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb39120 ACPI: FADT (v003 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb36000 ACPI: MADT (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb35000 ACPI: SPCR (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb2f000 ACPI: HPET (v001 INTEL S5000VSA 0x0001 INTL 0x0113) @ 0xdfb2e000 ACPI: MCFG (v001 INTEL S5000VSA 0x0001 INTL 0x0113) @ 0xdfb2d000 ACPI: SSDT (v002 INTEL S5000VSA 0x4000 INTL 0x0113) @ 0xdfb2c000 ACPI: DSDT (v002 INTEL S5000VSA 0x0008 INTL 0x0113) @ 0x No NUMA configuration found Faking a node at -00012000 Entering add_active_range(0, 0, 158) 0 entries of 3200 used Entering add_active_range(0, 256, 915768) 1 entries of 3200 used Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used Bootmem setup node 0 -00012000 Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1179648 early_node_map[7] active PFN ranges 0:0 -> 158 0: 256 -> 915768 0: 915922 -> 916034 0: 916122 -> 916152 0: 916250 -> 916268 0: 916282 -> 916480 0: 1048576 -> 1179648 On node 0 totalpages: 1047100 DMA zone: 64 pages used for memmap DMA zone: 1450 pages reserved DMA zone: 2484 pages, LIFO batch:0 DMA32 zone: 16320 pages used for memmap DMA32 zone: 895710 pages, LIFO batch:31 Normal zone: 2048 pages used for memmap Normal zone: 129024 pages, LIFO batch:31 ACPI: PM-Timer IO Port: 0x408 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) Processor #2 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) Processor #3 ACPI: LAPIC (acpi_id[0x04] lapic_id[0x84] disabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x85] disabled) ACPI: LAPIC
Re: SATA problems
Tejun Heo wrote: Pablo Sebastian Greco wrote: Tejun Heo wrote: * Pablo, the bug you saw was bad interaction between blacklisted NCQ device and dynamic queue depth adjustment. Patches are submitted to fix the problem. Just drop the blacklist patch. Your drives should work fine in NCQ mode. My gut feeling is that your problem is power related from the beginning. I had the same problems with a new Power Supply, Now everything is ok with the old Power Supply and the new drives. So, it was bad drives? Are you using the same model or different ones? NCQ works okay now? All I can say is that now is working, other things changed with the new drives: 1.5Gbps instead of 3Gbps, also new drives don't use NCQ (I'm reattaching a full dmesg). Also I've found this firmware upgrade (http://www.samsung.com/Products/HardDiskDrive/support/faqs/faqs_20060414_246673.htm) for the old drives, but couldn't confirm if it should be applied because the server is in Brazil and I live in Argentina. Won't be there until April to test. Thanks. Pablo. Linux version 2.6.19-1.2895.fc6 ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)) #1 SMP Wed Jan 10 18:50:56 EST 2007 Command line: ro root=LABEL=/ BIOS-provided physical RAM map: BIOS-e820: - 0009ec00 (usable) BIOS-e820: 0009ec00 - 0010 (reserved) BIOS-e820: 0010 - df938000 (usable) BIOS-e820: df938000 - df9d2000 (ACPI NVS) BIOS-e820: df9d2000 - dfa42000 (usable) BIOS-e820: dfa42000 - dfa9a000 (reserved) BIOS-e820: dfa9a000 - dfab8000 (usable) BIOS-e820: dfab8000 - dfb1a000 (ACPI NVS) BIOS-e820: dfb1a000 - dfb2c000 (usable) BIOS-e820: dfb2c000 - dfb3a000 (ACPI data) BIOS-e820: dfb3a000 - dfc0 (usable) BIOS-e820: ffc0 - ffc0c000 (reserved) BIOS-e820: 0001 - 00012000 (usable) Entering add_active_range(0, 0, 158) 0 entries of 3200 used Entering add_active_range(0, 256, 915768) 1 entries of 3200 used Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used end_pfn_map = 1179648 DMI 2.4 present. ACPI: RSDP (v002 INTEL ) @ 0x000f0350 ACPI: XSDT (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb39120 ACPI: FADT (v003 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb36000 ACPI: MADT (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb35000 ACPI: SPCR (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb2f000 ACPI: HPET (v001 INTEL S5000VSA 0x0001 INTL 0x0113) @ 0xdfb2e000 ACPI: MCFG (v001 INTEL S5000VSA 0x0001 INTL 0x0113) @ 0xdfb2d000 ACPI: SSDT (v002 INTEL S5000VSA 0x4000 INTL 0x0113) @ 0xdfb2c000 ACPI: DSDT (v002 INTEL S5000VSA 0x0008 INTL 0x0113) @ 0x No NUMA configuration found Faking a node at -00012000 Entering add_active_range(0, 0, 158) 0 entries of 3200 used Entering add_active_range(0, 256, 915768) 1 entries of 3200 used Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used Bootmem setup node 0 -00012000 Zone PFN ranges: DMA 0 - 4096 DMA324096 - 1048576 Normal1048576 - 1179648 early_node_map[7] active PFN ranges 0:0 - 158 0: 256 - 915768 0: 915922 - 916034 0: 916122 - 916152 0: 916250 - 916268 0: 916282 - 916480 0: 1048576 - 1179648 On node 0 totalpages: 1047100 DMA zone: 64 pages used for memmap DMA zone: 1450 pages reserved DMA zone: 2484 pages, LIFO batch:0 DMA32 zone: 16320 pages used for memmap DMA32 zone: 895710 pages, LIFO batch:31 Normal zone: 2048 pages used for memmap Normal zone: 129024 pages, LIFO batch:31 ACPI: PM-Timer IO Port: 0x408 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) Processor #2 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) Processor #3 ACPI: LAPIC (acpi_id[0x04] lapic_id[0x84] disabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x85] disabled) ACPI: LAPIC (acpi_id[0x06]
RE: SATA problems
Hi Marcus, Could you give more details ? I'm stucked with a boot problem on a Asus P5W that also includes a Jmicron behind a Sata port on ICH8, and the kernel boot goes timeout when probing it... I've been trying various kernels, including some patches from Tejun (Thanks !), but no luck to date... Mind sharing your .config so that I can check if I missed something obvious ? Regards, Paul Paul Rolland, rol(at)as2917.net ex-AS2917 Network administrator and Peering Coordinator -- Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur "Some people dream of success... while others wake up and work hard at it" "I worry about my child and the Internet all the time, even though she's too young to have logged on yet. Here's what I worry about. I worry that 10 or 15 years from now, she will come to me and say 'Daddy, where were you when they took freedom of the press away from the Internet?'" --Mike Godwin, Electronic Frontier Foundation > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Marcus Haebler > Sent: Wednesday, February 21, 2007 7:24 AM > To: Tejun Heo > Cc: Pablo Sebastian Greco; linux-kernel@vger.kernel.org > Subject: Re: SATA problems > > Tejun, > > I checked out the kernel 2.6.19 to 2.6.20 Changelog. Seems like you > fixed a problem with the JMB363. The Asus P5B-Deluxe I am using has a > JMB363 - besides an Intel ICH8R - with the SATA ports set to AHCI as > well. Looks like that might have been the source of the problem in > 2.6.19. > > Thanks, > > Marcus > > On 2/21/07, Marcus Haebler <[EMAIL PROTECTED]> wrote: > > Tejun, > > > > thanks. In preparation of your patch I installed a vanilla 2.6.20.1 > > kernel on my FC6 > > system. Amazingly the problem went away with the vanilla(!) > kernel and NCQ > > is enabled at boot time (queue_depth is 31). I guess I should have > > tried that kernel > > earlier. > > > > The patches you sent earlier apply w/o problems against the 2.6.20.1 > > vanilla kernel > > which is expected. I will test drive those patches tomorrow. > > > > BTW thanks for saving me the 'cat' on the 3 patches. ;) > > > > Thanks, > > > > Marcus > - > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun, I checked out the kernel 2.6.19 to 2.6.20 Changelog. Seems like you fixed a problem with the JMB363. The Asus P5B-Deluxe I am using has a JMB363 - besides an Intel ICH8R - with the SATA ports set to AHCI as well. Looks like that might have been the source of the problem in 2.6.19. Thanks, Marcus On 2/21/07, Marcus Haebler <[EMAIL PROTECTED]> wrote: Tejun, thanks. In preparation of your patch I installed a vanilla 2.6.20.1 kernel on my FC6 system. Amazingly the problem went away with the vanilla(!) kernel and NCQ is enabled at boot time (queue_depth is 31). I guess I should have tried that kernel earlier. The patches you sent earlier apply w/o problems against the 2.6.20.1 vanilla kernel which is expected. I will test drive those patches tomorrow. BTW thanks for saving me the 'cat' on the 3 patches. ;) Thanks, Marcus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun, thanks. In preparation of your patch I installed a vanilla 2.6.20.1 kernel on my FC6 system. Amazingly the problem went away with the vanilla(!) kernel and NCQ is enabled at boot time (queue_depth is 31). I guess I should have tried that kernel earlier. The patches you sent earlier apply w/o problems against the 2.6.20.1 vanilla kernel which is expected. I will test drive those patches tomorrow. BTW thanks for saving me the 'cat' on the 3 patches. ;) Thanks, Marcus On 2/20/07, Tejun Heo <[EMAIL PROTECTED]> wrote: Marcus Haebler wrote: > thanks for the patches! I am on an Intel P965/ICH8R. I see. That can happen too. There was a race window where in-flight r/w command which left SCSI midlayer but pending on libata gets executed in the wrong mode. If possible, please verify that it doesn't happen with the patches applied. I'm attaching combined patch against v2.6.20. Thanks. -- tejun diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 667acd2..348cc02 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -308,9 +308,7 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev, tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf->flags |= tf_flags; - if ((dev->flags & (ATA_DFLAG_PIO | ATA_DFLAG_NCQ_OFF | - ATA_DFLAG_NCQ)) == ATA_DFLAG_NCQ && - likely(tag != ATA_TAG_INTERNAL)) { + if (ata_ncq_enabled(dev) && likely(tag != ATA_TAG_INTERNAL)) { /* yay, NCQ */ if (!lba_48_ok(block, n_block)) return -ERANGE; diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 73902d3..ebb9185 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -945,29 +945,32 @@ int ata_scsi_change_queue_depth(struct scsi_device *sdev, int queue_depth) struct ata_port *ap = ata_shost_to_port(sdev->host); struct ata_device *dev; unsigned long flags; - int max_depth; - if (queue_depth < 1) + if (queue_depth < 1 || queue_depth == sdev->queue_depth) return sdev->queue_depth; dev = ata_scsi_find_dev(ap, sdev); if (!dev || !ata_dev_enabled(dev)) return sdev->queue_depth; - max_depth = min(sdev->host->can_queue, ata_id_queue_depth(dev->id)); - max_depth = min(ATA_MAX_QUEUE - 1, max_depth); - if (queue_depth > max_depth) - queue_depth = max_depth; - - scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth); - + /* NCQ enabled? */ spin_lock_irqsave(ap->lock, flags); - if (queue_depth > 1) - dev->flags &= ~ATA_DFLAG_NCQ_OFF; - else + dev->flags &= ~ATA_DFLAG_NCQ_OFF; + if (queue_depth == 1 || !ata_ncq_enabled(dev)) { dev->flags |= ATA_DFLAG_NCQ_OFF; + queue_depth = 1; + } spin_unlock_irqrestore(ap->lock, flags); + /* limit and apply queue depth */ + queue_depth = min(queue_depth, sdev->host->can_queue); + queue_depth = min(queue_depth, ata_id_queue_depth(dev->id)); + queue_depth = min(queue_depth, ATA_MAX_QUEUE - 1); + + if (sdev->queue_depth == queue_depth) + return -EINVAL; + + scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth); return queue_depth; } @@ -1454,11 +1457,9 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc) static int ata_scmd_need_defer(struct ata_device *dev, int is_io) { struct ata_port *ap = dev->ap; + int is_ncq = is_io && ata_ncq_enabled(dev); - if (!(dev->flags & ATA_DFLAG_NCQ)) - return 0; - - if (is_io) { + if (is_ncq) { if (!ata_tag_valid(ap->active_tag)) return 0; } else { diff --git a/include/linux/libata.h b/include/linux/libata.h index 91bb8ce..4e4e365 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1035,6 +1035,21 @@ static inline u8 ata_chk_status(struct ata_port *ap) return ap->ops->check_status(ap); } +/** + * ata_ncq_enabled - Test whether NCQ is enabled + * @dev: ATA device to test for + * + * LOCKING: + * spin_lock_irqsave(host lock) + * + * RETURNS: + * 1 if NCQ is enabled for @dev, 0 otherwise. + */ +static inline int ata_ncq_enabled(struct ata_device *dev) +{ + return (dev->flags & (ATA_DFLAG_PIO | ATA_DFLAG_NCQ_OFF | + ATA_DFLAG_NCQ)) == ATA_DFLAG_NCQ; +} /** * ata_pause - Flush writes and pause 400 nanoseconds. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Marcus Haebler wrote: > thanks for the patches! I am on an Intel P965/ICH8R. I see. That can happen too. There was a race window where in-flight r/w command which left SCSI midlayer but pending on libata gets executed in the wrong mode. If possible, please verify that it doesn't happen with the patches applied. I'm attaching combined patch against v2.6.20. Thanks. -- tejun diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 667acd2..348cc02 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -308,9 +308,7 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev, tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf->flags |= tf_flags; - if ((dev->flags & (ATA_DFLAG_PIO | ATA_DFLAG_NCQ_OFF | - ATA_DFLAG_NCQ)) == ATA_DFLAG_NCQ && - likely(tag != ATA_TAG_INTERNAL)) { + if (ata_ncq_enabled(dev) && likely(tag != ATA_TAG_INTERNAL)) { /* yay, NCQ */ if (!lba_48_ok(block, n_block)) return -ERANGE; diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 73902d3..ebb9185 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -945,29 +945,32 @@ int ata_scsi_change_queue_depth(struct scsi_device *sdev, int queue_depth) struct ata_port *ap = ata_shost_to_port(sdev->host); struct ata_device *dev; unsigned long flags; - int max_depth; - if (queue_depth < 1) + if (queue_depth < 1 || queue_depth == sdev->queue_depth) return sdev->queue_depth; dev = ata_scsi_find_dev(ap, sdev); if (!dev || !ata_dev_enabled(dev)) return sdev->queue_depth; - max_depth = min(sdev->host->can_queue, ata_id_queue_depth(dev->id)); - max_depth = min(ATA_MAX_QUEUE - 1, max_depth); - if (queue_depth > max_depth) - queue_depth = max_depth; - - scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth); - + /* NCQ enabled? */ spin_lock_irqsave(ap->lock, flags); - if (queue_depth > 1) - dev->flags &= ~ATA_DFLAG_NCQ_OFF; - else + dev->flags &= ~ATA_DFLAG_NCQ_OFF; + if (queue_depth == 1 || !ata_ncq_enabled(dev)) { dev->flags |= ATA_DFLAG_NCQ_OFF; + queue_depth = 1; + } spin_unlock_irqrestore(ap->lock, flags); + /* limit and apply queue depth */ + queue_depth = min(queue_depth, sdev->host->can_queue); + queue_depth = min(queue_depth, ata_id_queue_depth(dev->id)); + queue_depth = min(queue_depth, ATA_MAX_QUEUE - 1); + + if (sdev->queue_depth == queue_depth) + return -EINVAL; + + scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth); return queue_depth; } @@ -1454,11 +1457,9 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc) static int ata_scmd_need_defer(struct ata_device *dev, int is_io) { struct ata_port *ap = dev->ap; + int is_ncq = is_io && ata_ncq_enabled(dev); - if (!(dev->flags & ATA_DFLAG_NCQ)) - return 0; - - if (is_io) { + if (is_ncq) { if (!ata_tag_valid(ap->active_tag)) return 0; } else { diff --git a/include/linux/libata.h b/include/linux/libata.h index 91bb8ce..4e4e365 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1035,6 +1035,21 @@ static inline u8 ata_chk_status(struct ata_port *ap) return ap->ops->check_status(ap); } +/** + * ata_ncq_enabled - Test whether NCQ is enabled + * @dev: ATA device to test for + * + * LOCKING: + * spin_lock_irqsave(host lock) + * + * RETURNS: + * 1 if NCQ is enabled for @dev, 0 otherwise. + */ +static inline int ata_ncq_enabled(struct ata_device *dev) +{ + return (dev->flags & (ATA_DFLAG_PIO | ATA_DFLAG_NCQ_OFF | + ATA_DFLAG_NCQ)) == ATA_DFLAG_NCQ; +} /** * ata_pause - Flush writes and pause 400 nanoseconds.
Re: SATA problems
Pablo Sebastian Greco wrote: > Tejun Heo wrote: >> * Pablo, the bug you saw was bad interaction between blacklisted NCQ >> device and dynamic queue depth adjustment. Patches are submitted to fix >> the problem. Just drop the blacklist patch. Your drives should work >> fine in NCQ mode. My gut feeling is that your problem is power related >> from the beginning. >> > I had the same problems with a new Power Supply, Now everything is ok > with the old Power Supply and the new drives. So, it was bad drives? Are you using the same model or different ones? NCQ works okay now? -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun, thanks for the patches! I am on an Intel P965/ICH8R. Best, Marcus On 2/20/07, Tejun Heo <[EMAIL PROTECTED]> wrote: * Pablo, the bug you saw was bad interaction between blacklisted NCQ device and dynamic queue depth adjustment. Patches are submitted to fix the problem. Just drop the blacklist patch. Your drives should work fine in NCQ mode. My gut feeling is that your problem is power related from the beginning. * Marcus, you're on via's ahci controller, right? The problem you saw was bad interaction between blacklisted NCQ _controller_ and dynamic queue depth adjustment. Patches submitted. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: * Pablo, the bug you saw was bad interaction between blacklisted NCQ device and dynamic queue depth adjustment. Patches are submitted to fix the problem. Just drop the blacklist patch. Your drives should work fine in NCQ mode. My gut feeling is that your problem is power related from the beginning. * Marcus, you're on via's ahci controller, right? The problem you saw was bad interaction between blacklisted NCQ _controller_ and dynamic queue depth adjustment. Patches submitted. Thanks. I had the same problems with a new Power Supply, Now everything is ok with the old Power Supply and the new drives. Pablo. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
* Pablo, the bug you saw was bad interaction between blacklisted NCQ device and dynamic queue depth adjustment. Patches are submitted to fix the problem. Just drop the blacklist patch. Your drives should work fine in NCQ mode. My gut feeling is that your problem is power related from the beginning. * Marcus, you're on via's ahci controller, right? The problem you saw was bad interaction between blacklisted NCQ _controller_ and dynamic queue depth adjustment. Patches submitted. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
* Pablo, the bug you saw was bad interaction between blacklisted NCQ device and dynamic queue depth adjustment. Patches are submitted to fix the problem. Just drop the blacklist patch. Your drives should work fine in NCQ mode. My gut feeling is that your problem is power related from the beginning. * Marcus, you're on via's ahci controller, right? The problem you saw was bad interaction between blacklisted NCQ _controller_ and dynamic queue depth adjustment. Patches submitted. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: * Pablo, the bug you saw was bad interaction between blacklisted NCQ device and dynamic queue depth adjustment. Patches are submitted to fix the problem. Just drop the blacklist patch. Your drives should work fine in NCQ mode. My gut feeling is that your problem is power related from the beginning. * Marcus, you're on via's ahci controller, right? The problem you saw was bad interaction between blacklisted NCQ _controller_ and dynamic queue depth adjustment. Patches submitted. Thanks. I had the same problems with a new Power Supply, Now everything is ok with the old Power Supply and the new drives. Pablo. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun, thanks for the patches! I am on an Intel P965/ICH8R. Best, Marcus On 2/20/07, Tejun Heo [EMAIL PROTECTED] wrote: * Pablo, the bug you saw was bad interaction between blacklisted NCQ device and dynamic queue depth adjustment. Patches are submitted to fix the problem. Just drop the blacklist patch. Your drives should work fine in NCQ mode. My gut feeling is that your problem is power related from the beginning. * Marcus, you're on via's ahci controller, right? The problem you saw was bad interaction between blacklisted NCQ _controller_ and dynamic queue depth adjustment. Patches submitted. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Pablo Sebastian Greco wrote: Tejun Heo wrote: * Pablo, the bug you saw was bad interaction between blacklisted NCQ device and dynamic queue depth adjustment. Patches are submitted to fix the problem. Just drop the blacklist patch. Your drives should work fine in NCQ mode. My gut feeling is that your problem is power related from the beginning. I had the same problems with a new Power Supply, Now everything is ok with the old Power Supply and the new drives. So, it was bad drives? Are you using the same model or different ones? NCQ works okay now? -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Marcus Haebler wrote: thanks for the patches! I am on an Intel P965/ICH8R. I see. That can happen too. There was a race window where in-flight r/w command which left SCSI midlayer but pending on libata gets executed in the wrong mode. If possible, please verify that it doesn't happen with the patches applied. I'm attaching combined patch against v2.6.20. Thanks. -- tejun diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 667acd2..348cc02 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -308,9 +308,7 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev, tf-flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf-flags |= tf_flags; - if ((dev-flags (ATA_DFLAG_PIO | ATA_DFLAG_NCQ_OFF | - ATA_DFLAG_NCQ)) == ATA_DFLAG_NCQ - likely(tag != ATA_TAG_INTERNAL)) { + if (ata_ncq_enabled(dev) likely(tag != ATA_TAG_INTERNAL)) { /* yay, NCQ */ if (!lba_48_ok(block, n_block)) return -ERANGE; diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 73902d3..ebb9185 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -945,29 +945,32 @@ int ata_scsi_change_queue_depth(struct scsi_device *sdev, int queue_depth) struct ata_port *ap = ata_shost_to_port(sdev-host); struct ata_device *dev; unsigned long flags; - int max_depth; - if (queue_depth 1) + if (queue_depth 1 || queue_depth == sdev-queue_depth) return sdev-queue_depth; dev = ata_scsi_find_dev(ap, sdev); if (!dev || !ata_dev_enabled(dev)) return sdev-queue_depth; - max_depth = min(sdev-host-can_queue, ata_id_queue_depth(dev-id)); - max_depth = min(ATA_MAX_QUEUE - 1, max_depth); - if (queue_depth max_depth) - queue_depth = max_depth; - - scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth); - + /* NCQ enabled? */ spin_lock_irqsave(ap-lock, flags); - if (queue_depth 1) - dev-flags = ~ATA_DFLAG_NCQ_OFF; - else + dev-flags = ~ATA_DFLAG_NCQ_OFF; + if (queue_depth == 1 || !ata_ncq_enabled(dev)) { dev-flags |= ATA_DFLAG_NCQ_OFF; + queue_depth = 1; + } spin_unlock_irqrestore(ap-lock, flags); + /* limit and apply queue depth */ + queue_depth = min(queue_depth, sdev-host-can_queue); + queue_depth = min(queue_depth, ata_id_queue_depth(dev-id)); + queue_depth = min(queue_depth, ATA_MAX_QUEUE - 1); + + if (sdev-queue_depth == queue_depth) + return -EINVAL; + + scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth); return queue_depth; } @@ -1454,11 +1457,9 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc) static int ata_scmd_need_defer(struct ata_device *dev, int is_io) { struct ata_port *ap = dev-ap; + int is_ncq = is_io ata_ncq_enabled(dev); - if (!(dev-flags ATA_DFLAG_NCQ)) - return 0; - - if (is_io) { + if (is_ncq) { if (!ata_tag_valid(ap-active_tag)) return 0; } else { diff --git a/include/linux/libata.h b/include/linux/libata.h index 91bb8ce..4e4e365 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1035,6 +1035,21 @@ static inline u8 ata_chk_status(struct ata_port *ap) return ap-ops-check_status(ap); } +/** + * ata_ncq_enabled - Test whether NCQ is enabled + * @dev: ATA device to test for + * + * LOCKING: + * spin_lock_irqsave(host lock) + * + * RETURNS: + * 1 if NCQ is enabled for @dev, 0 otherwise. + */ +static inline int ata_ncq_enabled(struct ata_device *dev) +{ + return (dev-flags (ATA_DFLAG_PIO | ATA_DFLAG_NCQ_OFF | + ATA_DFLAG_NCQ)) == ATA_DFLAG_NCQ; +} /** * ata_pause - Flush writes and pause 400 nanoseconds.
Re: SATA problems
Tejun, thanks. In preparation of your patch I installed a vanilla 2.6.20.1 kernel on my FC6 system. Amazingly the problem went away with the vanilla(!) kernel and NCQ is enabled at boot time (queue_depth is 31). I guess I should have tried that kernel earlier. The patches you sent earlier apply w/o problems against the 2.6.20.1 vanilla kernel which is expected. I will test drive those patches tomorrow. BTW thanks for saving me the 'cat' on the 3 patches. ;) Thanks, Marcus On 2/20/07, Tejun Heo [EMAIL PROTECTED] wrote: Marcus Haebler wrote: thanks for the patches! I am on an Intel P965/ICH8R. I see. That can happen too. There was a race window where in-flight r/w command which left SCSI midlayer but pending on libata gets executed in the wrong mode. If possible, please verify that it doesn't happen with the patches applied. I'm attaching combined patch against v2.6.20. Thanks. -- tejun diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 667acd2..348cc02 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -308,9 +308,7 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev, tf-flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf-flags |= tf_flags; - if ((dev-flags (ATA_DFLAG_PIO | ATA_DFLAG_NCQ_OFF | - ATA_DFLAG_NCQ)) == ATA_DFLAG_NCQ - likely(tag != ATA_TAG_INTERNAL)) { + if (ata_ncq_enabled(dev) likely(tag != ATA_TAG_INTERNAL)) { /* yay, NCQ */ if (!lba_48_ok(block, n_block)) return -ERANGE; diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 73902d3..ebb9185 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -945,29 +945,32 @@ int ata_scsi_change_queue_depth(struct scsi_device *sdev, int queue_depth) struct ata_port *ap = ata_shost_to_port(sdev-host); struct ata_device *dev; unsigned long flags; - int max_depth; - if (queue_depth 1) + if (queue_depth 1 || queue_depth == sdev-queue_depth) return sdev-queue_depth; dev = ata_scsi_find_dev(ap, sdev); if (!dev || !ata_dev_enabled(dev)) return sdev-queue_depth; - max_depth = min(sdev-host-can_queue, ata_id_queue_depth(dev-id)); - max_depth = min(ATA_MAX_QUEUE - 1, max_depth); - if (queue_depth max_depth) - queue_depth = max_depth; - - scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth); - + /* NCQ enabled? */ spin_lock_irqsave(ap-lock, flags); - if (queue_depth 1) - dev-flags = ~ATA_DFLAG_NCQ_OFF; - else + dev-flags = ~ATA_DFLAG_NCQ_OFF; + if (queue_depth == 1 || !ata_ncq_enabled(dev)) { dev-flags |= ATA_DFLAG_NCQ_OFF; + queue_depth = 1; + } spin_unlock_irqrestore(ap-lock, flags); + /* limit and apply queue depth */ + queue_depth = min(queue_depth, sdev-host-can_queue); + queue_depth = min(queue_depth, ata_id_queue_depth(dev-id)); + queue_depth = min(queue_depth, ATA_MAX_QUEUE - 1); + + if (sdev-queue_depth == queue_depth) + return -EINVAL; + + scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth); return queue_depth; } @@ -1454,11 +1457,9 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc) static int ata_scmd_need_defer(struct ata_device *dev, int is_io) { struct ata_port *ap = dev-ap; + int is_ncq = is_io ata_ncq_enabled(dev); - if (!(dev-flags ATA_DFLAG_NCQ)) - return 0; - - if (is_io) { + if (is_ncq) { if (!ata_tag_valid(ap-active_tag)) return 0; } else { diff --git a/include/linux/libata.h b/include/linux/libata.h index 91bb8ce..4e4e365 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1035,6 +1035,21 @@ static inline u8 ata_chk_status(struct ata_port *ap) return ap-ops-check_status(ap); } +/** + * ata_ncq_enabled - Test whether NCQ is enabled + * @dev: ATA device to test for + * + * LOCKING: + * spin_lock_irqsave(host lock) + * + * RETURNS: + * 1 if NCQ is enabled for @dev, 0 otherwise. + */ +static inline int ata_ncq_enabled(struct ata_device *dev) +{ + return (dev-flags (ATA_DFLAG_PIO | ATA_DFLAG_NCQ_OFF | + ATA_DFLAG_NCQ)) == ATA_DFLAG_NCQ; +} /** * ata_pause - Flush writes and pause 400 nanoseconds. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun, I checked out the kernel 2.6.19 to 2.6.20 Changelog. Seems like you fixed a problem with the JMB363. The Asus P5B-Deluxe I am using has a JMB363 - besides an Intel ICH8R - with the SATA ports set to AHCI as well. Looks like that might have been the source of the problem in 2.6.19. Thanks, Marcus On 2/21/07, Marcus Haebler [EMAIL PROTECTED] wrote: Tejun, thanks. In preparation of your patch I installed a vanilla 2.6.20.1 kernel on my FC6 system. Amazingly the problem went away with the vanilla(!) kernel and NCQ is enabled at boot time (queue_depth is 31). I guess I should have tried that kernel earlier. The patches you sent earlier apply w/o problems against the 2.6.20.1 vanilla kernel which is expected. I will test drive those patches tomorrow. BTW thanks for saving me the 'cat' on the 3 patches. ;) Thanks, Marcus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: SATA problems
Hi Marcus, Could you give more details ? I'm stucked with a boot problem on a Asus P5W that also includes a Jmicron behind a Sata port on ICH8, and the kernel boot goes timeout when probing it... I've been trying various kernels, including some patches from Tejun (Thanks !), but no luck to date... Mind sharing your .config so that I can check if I missed something obvious ? Regards, Paul Paul Rolland, rol(at)as2917.net ex-AS2917 Network administrator and Peering Coordinator -- Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur Some people dream of success... while others wake up and work hard at it I worry about my child and the Internet all the time, even though she's too young to have logged on yet. Here's what I worry about. I worry that 10 or 15 years from now, she will come to me and say 'Daddy, where were you when they took freedom of the press away from the Internet?' --Mike Godwin, Electronic Frontier Foundation -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marcus Haebler Sent: Wednesday, February 21, 2007 7:24 AM To: Tejun Heo Cc: Pablo Sebastian Greco; linux-kernel@vger.kernel.org Subject: Re: SATA problems Tejun, I checked out the kernel 2.6.19 to 2.6.20 Changelog. Seems like you fixed a problem with the JMB363. The Asus P5B-Deluxe I am using has a JMB363 - besides an Intel ICH8R - with the SATA ports set to AHCI as well. Looks like that might have been the source of the problem in 2.6.19. Thanks, Marcus On 2/21/07, Marcus Haebler [EMAIL PROTECTED] wrote: Tejun, thanks. In preparation of your patch I installed a vanilla 2.6.20.1 kernel on my FC6 system. Amazingly the problem went away with the vanilla(!) kernel and NCQ is enabled at boot time (queue_depth is 31). I guess I should have tried that kernel earlier. The patches you sent earlier apply w/o problems against the 2.6.20.1 vanilla kernel which is expected. I will test drive those patches tomorrow. BTW thanks for saving me the 'cat' on the 3 patches. ;) Thanks, Marcus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Marcus Haebler wrote: I opened a bug report (228979) on bugzilla.redhat.com on this one because I have the same issue under FC6 2.6.19-1.2895. Here is the link: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228979 Do you have any more updates on this problem? Is there a way I can help by providing debug data? Thanks, Marcus On 1/23/07, Tejun Heo <[EMAIL PROTECTED]> wrote: Pablo Sebastian Greco wrote: > Well, it took me a few days, but I think I'm ready to report back. One > of the drives was failing, and it stopped after rewiring power supply so > the last problem seems to be corrected. > OTOH, your blacklist seems to be needed too, now I'm running FC6 > distribution kernel 2.6.19-1.2895.fc6 (2.6.19.2 + some patches by > fedora) and setting > echo 1 >/sys/block/sdX/device/queue_depth > on all the SAMSUNG drives (sdb, sdc and sdd) > The second I type > echo 31 >/sys/block/sdX/device/queue_depth > on any of the drives I get these messages > > Jan 23 12:36:30 squid kernel: BUG: warning: (ap->ops->error_handler && > ata_tag_valid(ap->active_tag)) at > drivers/ata/libata-core.c:4602/ata_qc_issue() (Not ta > inted) This is kernel bug that needs fixing. I'll investigate. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ On my side, all the problems dissapeared on all the kernels after changing all 3 drives to non-NCQ drives, I was going crazy. New dmesg attached Pablo. Linux version 2.6.19-1.2895.fc6 ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)) #1 SMP Wed Jan 10 18:50:56 EST 2007 Command line: ro root=LABEL=/ BIOS-provided physical RAM map: BIOS-e820: - 0009ec00 (usable) BIOS-e820: 0009ec00 - 0010 (reserved) BIOS-e820: 0010 - df938000 (usable) BIOS-e820: df938000 - df9d2000 (ACPI NVS) BIOS-e820: df9d2000 - dfa42000 (usable) BIOS-e820: dfa42000 - dfa9a000 (reserved) BIOS-e820: dfa9a000 - dfab8000 (usable) BIOS-e820: dfab8000 - dfb1a000 (ACPI NVS) BIOS-e820: dfb1a000 - dfb2c000 (usable) BIOS-e820: dfb2c000 - dfb3a000 (ACPI data) BIOS-e820: dfb3a000 - dfc0 (usable) BIOS-e820: ffc0 - ffc0c000 (reserved) BIOS-e820: 0001 - 00012000 (usable) Entering add_active_range(0, 0, 158) 0 entries of 3200 used Entering add_active_range(0, 256, 915768) 1 entries of 3200 used Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used end_pfn_map = 1179648 DMI 2.4 present. ACPI: RSDP (v002 INTEL ) @ 0x000f0350 ACPI: XSDT (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb39120 ACPI: FADT (v003 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb36000 ACPI: MADT (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb35000 ACPI: SPCR (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb2f000 ACPI: HPET (v001 INTEL S5000VSA 0x0001 INTL 0x0113) @ 0xdfb2e000 ACPI: MCFG (v001 INTEL S5000VSA 0x0001 INTL 0x0113) @ 0xdfb2d000 ACPI: SSDT (v002 INTEL S5000VSA 0x4000 INTL 0x0113) @ 0xdfb2c000 ACPI: DSDT (v002 INTEL S5000VSA 0x0008 INTL 0x0113) @ 0x No NUMA configuration found Faking a node at -00012000 Entering add_active_range(0, 0, 158) 0 entries of 3200 used Entering add_active_range(0, 256, 915768) 1 entries of 3200 used Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used Bootmem setup node 0 -00012000 Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1179648 early_node_map[7] active PFN ranges 0:0 -> 158 0: 256 -> 915768 0: 915922 -> 916034 0: 916122 -> 916152 0: 916250 -> 916268 0: 916282 -> 916480 0: 1048576 -> 1179648 On node 0 totalpages: 1047100 DMA zone: 64 pages used for memmap DMA zone: 1450 pages reserved DMA zone: 2484 pages, LIFO batch:0 DMA32 zone: 16320 pages used for memmap DMA32 zone: 895710 pages, LIFO batch:31 Normal zone:
Re: SATA problems
I opened a bug report (228979) on bugzilla.redhat.com on this one because I have the same issue under FC6 2.6.19-1.2895. Here is the link: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228979 Do you have any more updates on this problem? Is there a way I can help by providing debug data? Thanks, Marcus On 1/23/07, Tejun Heo <[EMAIL PROTECTED]> wrote: Pablo Sebastian Greco wrote: > Well, it took me a few days, but I think I'm ready to report back. One > of the drives was failing, and it stopped after rewiring power supply so > the last problem seems to be corrected. > OTOH, your blacklist seems to be needed too, now I'm running FC6 > distribution kernel 2.6.19-1.2895.fc6 (2.6.19.2 + some patches by > fedora) and setting > echo 1 >/sys/block/sdX/device/queue_depth > on all the SAMSUNG drives (sdb, sdc and sdd) > The second I type > echo 31 >/sys/block/sdX/device/queue_depth > on any of the drives I get these messages > > Jan 23 12:36:30 squid kernel: BUG: warning: (ap->ops->error_handler && > ata_tag_valid(ap->active_tag)) at > drivers/ata/libata-core.c:4602/ata_qc_issue() (Not ta > inted) This is kernel bug that needs fixing. I'll investigate. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
I opened a bug report (228979) on bugzilla.redhat.com on this one because I have the same issue under FC6 2.6.19-1.2895. Here is the link: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228979 Do you have any more updates on this problem? Is there a way I can help by providing debug data? Thanks, Marcus On 1/23/07, Tejun Heo [EMAIL PROTECTED] wrote: Pablo Sebastian Greco wrote: Well, it took me a few days, but I think I'm ready to report back. One of the drives was failing, and it stopped after rewiring power supply so the last problem seems to be corrected. OTOH, your blacklist seems to be needed too, now I'm running FC6 distribution kernel 2.6.19-1.2895.fc6 (2.6.19.2 + some patches by fedora) and setting echo 1 /sys/block/sdX/device/queue_depth on all the SAMSUNG drives (sdb, sdc and sdd) The second I type echo 31 /sys/block/sdX/device/queue_depth on any of the drives I get these messages Jan 23 12:36:30 squid kernel: BUG: warning: (ap-ops-error_handler ata_tag_valid(ap-active_tag)) at drivers/ata/libata-core.c:4602/ata_qc_issue() (Not ta inted) This is kernel bug that needs fixing. I'll investigate. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Marcus Haebler wrote: I opened a bug report (228979) on bugzilla.redhat.com on this one because I have the same issue under FC6 2.6.19-1.2895. Here is the link: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228979 Do you have any more updates on this problem? Is there a way I can help by providing debug data? Thanks, Marcus On 1/23/07, Tejun Heo [EMAIL PROTECTED] wrote: Pablo Sebastian Greco wrote: Well, it took me a few days, but I think I'm ready to report back. One of the drives was failing, and it stopped after rewiring power supply so the last problem seems to be corrected. OTOH, your blacklist seems to be needed too, now I'm running FC6 distribution kernel 2.6.19-1.2895.fc6 (2.6.19.2 + some patches by fedora) and setting echo 1 /sys/block/sdX/device/queue_depth on all the SAMSUNG drives (sdb, sdc and sdd) The second I type echo 31 /sys/block/sdX/device/queue_depth on any of the drives I get these messages Jan 23 12:36:30 squid kernel: BUG: warning: (ap-ops-error_handler ata_tag_valid(ap-active_tag)) at drivers/ata/libata-core.c:4602/ata_qc_issue() (Not ta inted) This is kernel bug that needs fixing. I'll investigate. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ On my side, all the problems dissapeared on all the kernels after changing all 3 drives to non-NCQ drives, I was going crazy. New dmesg attached Pablo. Linux version 2.6.19-1.2895.fc6 ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)) #1 SMP Wed Jan 10 18:50:56 EST 2007 Command line: ro root=LABEL=/ BIOS-provided physical RAM map: BIOS-e820: - 0009ec00 (usable) BIOS-e820: 0009ec00 - 0010 (reserved) BIOS-e820: 0010 - df938000 (usable) BIOS-e820: df938000 - df9d2000 (ACPI NVS) BIOS-e820: df9d2000 - dfa42000 (usable) BIOS-e820: dfa42000 - dfa9a000 (reserved) BIOS-e820: dfa9a000 - dfab8000 (usable) BIOS-e820: dfab8000 - dfb1a000 (ACPI NVS) BIOS-e820: dfb1a000 - dfb2c000 (usable) BIOS-e820: dfb2c000 - dfb3a000 (ACPI data) BIOS-e820: dfb3a000 - dfc0 (usable) BIOS-e820: ffc0 - ffc0c000 (reserved) BIOS-e820: 0001 - 00012000 (usable) Entering add_active_range(0, 0, 158) 0 entries of 3200 used Entering add_active_range(0, 256, 915768) 1 entries of 3200 used Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used end_pfn_map = 1179648 DMI 2.4 present. ACPI: RSDP (v002 INTEL ) @ 0x000f0350 ACPI: XSDT (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb39120 ACPI: FADT (v003 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb36000 ACPI: MADT (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb35000 ACPI: SPCR (v001 INTEL S5000VSA 0x INTL 0x0113) @ 0xdfb2f000 ACPI: HPET (v001 INTEL S5000VSA 0x0001 INTL 0x0113) @ 0xdfb2e000 ACPI: MCFG (v001 INTEL S5000VSA 0x0001 INTL 0x0113) @ 0xdfb2d000 ACPI: SSDT (v002 INTEL S5000VSA 0x4000 INTL 0x0113) @ 0xdfb2c000 ACPI: DSDT (v002 INTEL S5000VSA 0x0008 INTL 0x0113) @ 0x No NUMA configuration found Faking a node at -00012000 Entering add_active_range(0, 0, 158) 0 entries of 3200 used Entering add_active_range(0, 256, 915768) 1 entries of 3200 used Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used Bootmem setup node 0 -00012000 Zone PFN ranges: DMA 0 - 4096 DMA324096 - 1048576 Normal1048576 - 1179648 early_node_map[7] active PFN ranges 0:0 - 158 0: 256 - 915768 0: 915922 - 916034 0: 916122 - 916152 0: 916250 - 916268 0: 916282 - 916480 0: 1048576 - 1179648 On node 0 totalpages: 1047100 DMA zone: 64 pages used for memmap DMA zone: 1450 pages reserved DMA zone: 2484 pages, LIFO batch:0 DMA32 zone: 16320 pages used for memmap DMA32 zone: 895710 pages, LIFO batch:31 Normal zone: 2048 pages used for memmap Normal
Re: SATA problems
Pablo Sebastian Greco wrote: > Well, it took me a few days, but I think I'm ready to report back. One > of the drives was failing, and it stopped after rewiring power supply so > the last problem seems to be corrected. > OTOH, your blacklist seems to be needed too, now I'm running FC6 > distribution kernel 2.6.19-1.2895.fc6 (2.6.19.2 + some patches by > fedora) and setting > echo 1 >/sys/block/sdX/device/queue_depth > on all the SAMSUNG drives (sdb, sdc and sdd) > The second I type > echo 31 >/sys/block/sdX/device/queue_depth > on any of the drives I get these messages > > Jan 23 12:36:30 squid kernel: BUG: warning: (ap->ops->error_handler && > ata_tag_valid(ap->active_tag)) at > drivers/ata/libata-core.c:4602/ata_qc_issue() (Not ta > inted) This is kernel bug that needs fixing. I'll investigate. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: Hello, Pablo. Please apply common hardware debugging method. You know, swap drives. Use separate power supply for disks, swap cables, etc... It seems more like a hardware problem at this point. Thanks. Well, it took me a few days, but I think I'm ready to report back. One of the drives was failing, and it stopped after rewiring power supply so the last problem seems to be corrected. OTOH, your blacklist seems to be needed too, now I'm running FC6 distribution kernel 2.6.19-1.2895.fc6 (2.6.19.2 + some patches by fedora) and setting echo 1 >/sys/block/sdX/device/queue_depth on all the SAMSUNG drives (sdb, sdc and sdd) The second I type echo 31 >/sys/block/sdX/device/queue_depth on any of the drives I get these messages Jan 23 12:36:30 squid kernel: BUG: warning: (ap->ops->error_handler && ata_tag_valid(ap->active_tag)) at drivers/ata/libata-core.c:4602/ata_qc_issue() (Not ta inted) Jan 23 12:36:30 squid kernel: Jan 23 12:36:30 squid kernel: Call Trace: Jan 23 12:36:30 squid kernel: [] show_trace+0x34/0x47 Jan 23 12:36:30 squid kernel: [] dump_stack+0x12/0x17 Jan 23 12:36:30 squid kernel: [] :libata:ata_qc_issue+0x61/0x551 Jan 23 12:36:30 squid kernel: [] :libata:ata_scsi_translate+0xd1/0x11a Jan 23 12:36:30 squid kernel: [] :libata:ata_scsi_queuecmd+0x103/0x122 Jan 23 12:36:30 squid kernel: [] :scsi_mod:scsi_dispatch_cmd+0x27c/0x30d Jan 23 12:36:30 squid kernel: [] :scsi_mod:scsi_request_fn+0x2ca/0x395 Jan 23 12:36:30 squid kernel: [] elv_insert+0x15a/0x226 Jan 23 12:36:30 squid kernel: [] __make_request+0x439/0x487 Jan 23 12:36:30 squid kernel: [] generic_make_request+0x207/0x21e Jan 23 12:36:30 squid kernel: [] submit_bio+0xee/0xf7 Jan 23 12:36:30 squid kernel: [] submit_bh+0x130/0x150 Jan 23 12:36:30 squid kernel: [] ll_rw_block+0x9d/0xc0 Jan 23 12:36:30 squid kernel: [] :reiserfs:search_by_key+0x13d/0xce7 Jan 23 12:36:30 squid kernel: [] :reiserfs:search_for_position_by_key+0x34/0x2ad Jan 23 12:36:30 squid kernel: [] :reiserfs:_get_block_create_0+0x86/0x544 Jan 23 12:36:30 squid kernel: [] :reiserfs:reiserfs_get_block+0xcd/0xfdd Jan 23 12:36:30 squid kernel: [] do_mpage_readpage+0x16d/0x4b0 Jan 23 12:36:30 squid kernel: [] mpage_readpages+0xb3/0x146 Jan 23 12:36:30 squid kernel: [] __do_page_cache_readahead+0x119/0x209 Jan 23 12:36:30 squid kernel: [] blockable_page_cache_readahead+0x56/0xb5 Jan 23 12:36:30 squid kernel: [] page_cache_readahead+0xd6/0x1af Jan 23 12:36:30 squid kernel: [] do_generic_mapping_read+0x129/0x40b Jan 23 12:36:30 squid kernel: [] generic_file_aio_read+0x15f/0x1b1 Jan 23 12:36:30 squid kernel: [] do_sync_read+0xc9/0x10c Jan 23 12:36:30 squid kernel: [] vfs_read+0xcb/0x170 Jan 23 12:36:30 squid kernel: [] sys_read+0x45/0x6e Jan 23 12:36:30 squid kernel: [] system_call+0x7e/0x83 Jan 23 12:36:30 squid kernel: [<00359ccbfb80>] Jan 23 12:36:30 squid kernel: Thanks for everything. Pablo. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: Hello, Pablo. Please apply common hardware debugging method. You know, swap drives. Use separate power supply for disks, swap cables, etc... It seems more like a hardware problem at this point. Thanks. Well, it took me a few days, but I think I'm ready to report back. One of the drives was failing, and it stopped after rewiring power supply so the last problem seems to be corrected. OTOH, your blacklist seems to be needed too, now I'm running FC6 distribution kernel 2.6.19-1.2895.fc6 (2.6.19.2 + some patches by fedora) and setting echo 1 /sys/block/sdX/device/queue_depth on all the SAMSUNG drives (sdb, sdc and sdd) The second I type echo 31 /sys/block/sdX/device/queue_depth on any of the drives I get these messages Jan 23 12:36:30 squid kernel: BUG: warning: (ap-ops-error_handler ata_tag_valid(ap-active_tag)) at drivers/ata/libata-core.c:4602/ata_qc_issue() (Not ta inted) Jan 23 12:36:30 squid kernel: Jan 23 12:36:30 squid kernel: Call Trace: Jan 23 12:36:30 squid kernel: [8026999a] show_trace+0x34/0x47 Jan 23 12:36:30 squid kernel: [802699bf] dump_stack+0x12/0x17 Jan 23 12:36:30 squid kernel: [88092d50] :libata:ata_qc_issue+0x61/0x551 Jan 23 12:36:30 squid kernel: [88097bc8] :libata:ata_scsi_translate+0xd1/0x11a Jan 23 12:36:30 squid kernel: [88098b87] :libata:ata_scsi_queuecmd+0x103/0x122 Jan 23 12:36:30 squid kernel: [8805cbc1] :scsi_mod:scsi_dispatch_cmd+0x27c/0x30d Jan 23 12:36:30 squid kernel: [88061dbe] :scsi_mod:scsi_request_fn+0x2ca/0x395 Jan 23 12:36:30 squid kernel: [8033844e] elv_insert+0x15a/0x226 Jan 23 12:36:30 squid kernel: [8020bcc2] __make_request+0x439/0x487 Jan 23 12:36:30 squid kernel: [8021bf12] generic_make_request+0x207/0x21e Jan 23 12:36:30 squid kernel: [80232f7d] submit_bio+0xee/0xf7 Jan 23 12:36:30 squid kernel: [8021a4f0] submit_bh+0x130/0x150 Jan 23 12:36:30 squid kernel: [80217187] ll_rw_block+0x9d/0xc0 Jan 23 12:36:30 squid kernel: [881adf63] :reiserfs:search_by_key+0x13d/0xce7 Jan 23 12:36:30 squid kernel: [881aee54] :reiserfs:search_for_position_by_key+0x34/0x2ad Jan 23 12:36:30 squid kernel: [8819bd48] :reiserfs:_get_block_create_0+0x86/0x544 Jan 23 12:36:30 squid kernel: [8819d508] :reiserfs:reiserfs_get_block+0xcd/0xfdd Jan 23 12:36:30 squid kernel: [80228c34] do_mpage_readpage+0x16d/0x4b0 Jan 23 12:36:30 squid kernel: [802388df] mpage_readpages+0xb3/0x146 Jan 23 12:36:30 squid kernel: [80212a81] __do_page_cache_readahead+0x119/0x209 Jan 23 12:36:30 squid kernel: [80231fed] blockable_page_cache_readahead+0x56/0xb5 Jan 23 12:36:30 squid kernel: [80213b7a] page_cache_readahead+0xd6/0x1af Jan 23 12:36:30 squid kernel: [8020be39] do_generic_mapping_read+0x129/0x40b Jan 23 12:36:30 squid kernel: [80216a02] generic_file_aio_read+0x15f/0x1b1 Jan 23 12:36:30 squid kernel: [8020c92b] do_sync_read+0xc9/0x10c Jan 23 12:36:30 squid kernel: [8020b226] vfs_read+0xcb/0x170 Jan 23 12:36:30 squid kernel: [80211731] sys_read+0x45/0x6e Jan 23 12:36:30 squid kernel: [8025c11e] system_call+0x7e/0x83 Jan 23 12:36:30 squid kernel: [00359ccbfb80] Jan 23 12:36:30 squid kernel: Thanks for everything. Pablo. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Pablo Sebastian Greco wrote: Well, it took me a few days, but I think I'm ready to report back. One of the drives was failing, and it stopped after rewiring power supply so the last problem seems to be corrected. OTOH, your blacklist seems to be needed too, now I'm running FC6 distribution kernel 2.6.19-1.2895.fc6 (2.6.19.2 + some patches by fedora) and setting echo 1 /sys/block/sdX/device/queue_depth on all the SAMSUNG drives (sdb, sdc and sdd) The second I type echo 31 /sys/block/sdX/device/queue_depth on any of the drives I get these messages Jan 23 12:36:30 squid kernel: BUG: warning: (ap-ops-error_handler ata_tag_valid(ap-active_tag)) at drivers/ata/libata-core.c:4602/ata_qc_issue() (Not ta inted) This is kernel bug that needs fixing. I'll investigate. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Hello, Pablo. Please apply common hardware debugging method. You know, swap drives. Use separate power supply for disks, swap cables, etc... It seems more like a hardware problem at this point. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Pablo Sebastian Greco wrote: Tejun Heo wrote: Pablo Sebastian Greco wrote: After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x? I forgot this (even though I implemented it) but you can turn off NCQ by doing the following. # echo 1 > /sys/block/sdX/device/queue_depth Can you put the seagate drive under load to verify that it's the samsung drive's problem not the controller's? Just an off topic question, does anyone know why I get so uneven IRQ handling on 2.6.19-20 and almost perfect on 2.6.20-rc2-mm1? I dunno. You have much better chance of getting a useful answer by asking it on a separate thread with proper subject line. People usualyl screen threads by subject. There are just too many message in LKML for anyone to follow all the message. Thanks. Guess I spoke too soon :( Today I found this Jan 8 04:01:40 squid kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 8 04:01:40 squid kernel: ata2.00: cmd 25/00:08:49:ee:e8/00:00:16:00:00/e0 tag 0 cdb 0x0 data 4096 in Jan 8 04:01:40 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 8 04:01:40 squid kernel: ata2: soft resetting port Jan 8 04:01:40 squid kernel: ata2: softreset failed (port busy but CLO unavailable) Jan 8 04:01:40 squid kernel: ata2: softreset failed, retrying in 5 secs Jan 8 04:01:45 squid kernel: ata2: hard resetting port Jan 8 04:01:53 squid kernel: ata2: port is slow to respond, please be patient (Status 0x80) Jan 8 04:02:16 squid kernel: ata2: port failed to respond (30 secs, Status 0x80) Jan 8 04:02:16 squid kernel: ata2: COMRESET failed (device not ready) Jan 8 04:02:16 squid kernel: ata2: hardreset failed, retrying in 5 secs Jan 8 04:02:21 squid kernel: ata2: hard resetting port Jan 8 04:02:21 squid kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 8 04:02:21 squid kernel: ata2.00: configured for UDMA/133 Jan 8 04:02:21 squid kernel: ata2: EH complete Jan 8 04:02:21 squid kernel: SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) Jan 8 04:02:21 squid kernel: sdb: Write Protect is off Jan 8 04:02:21 squid kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA #uptime 10:10:12 up 3 days, 22:48, 1 user, load average: 0.22, 0.19, 0.18 4 am is the lowest load ever, so I don't get it. I've found two differences with older errors SAct is now 0x0 when before was 0x7fff And the cmd/res used to be really long, now it's just one command About heavy loading the seagate, I've tested as suggested on other thread dd if= of=/dev/null for all 4 drives simultaneously, on top of usual load, and all was perfect with current kernel (2.6.20-rc3 + blacklist). Don't know what to do to help Thanks. Pablo. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ And now this :( , still running rc3+blacklist without rebooting Jan 9 05:30:36 squid kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 9 05:30:36 squid kernel: ata2.00: cmd c8/00:08:87:83:85/00:00:00:00:00/e2 tag 0 cdb 0x0 data 4096 in Jan 9 05:30:36 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 9 05:30:36 squid kernel: ata2: soft resetting port Jan 9 05:30:36 squid kernel: ata2: softreset failed (port busy but CLO unavailable) Jan 9 05:30:36 squid kernel: ata2: softreset failed, retrying in 5 secs Jan 9 05:30:41 squid kernel: ata2: hard resetting port Jan 9 05:30:49 squid kernel: ata2: port is slow to respond, please be patient (Status 0x80) Jan 9 05:31:12 squid kernel: ata2: port failed to respond (30 secs, Status 0x80) Jan 9 05:31:12 squid kernel: ata2: COMRESET failed (device not ready) Jan 9 05:31:12 squid kernel: ata2: hardreset failed, retrying in 5 secs Jan 9 05:31:17 squid kernel: ata2: hard resetting port Jan 9 05:31:17 squid kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 9 05:31:17 squid kernel: ata2.00: configured for UDMA/133 Jan 9 05:31:17 squid kernel: ata2: EH complete Jan 9 05:31:17 squid kernel: SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) Jan 9 05:31:17 squid kernel: sdb: Write Protect is off Jan 9 05:31:17 squid kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jan 9 05:32:17 squid kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 9 05:32:17 squid kernel: ata2.00: cmd c8/00:08:37:ac:04/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in Jan 9 05:32:17 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 9 05:32:18 squid kernel: ata2: soft resetting port Jan 9 05:32:18
Re: SATA problems
Pablo Sebastian Greco wrote: Tejun Heo wrote: Pablo Sebastian Greco wrote: After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x? I forgot this (even though I implemented it) but you can turn off NCQ by doing the following. # echo 1 /sys/block/sdX/device/queue_depth Can you put the seagate drive under load to verify that it's the samsung drive's problem not the controller's? Just an off topic question, does anyone know why I get so uneven IRQ handling on 2.6.19-20 and almost perfect on 2.6.20-rc2-mm1? I dunno. You have much better chance of getting a useful answer by asking it on a separate thread with proper subject line. People usualyl screen threads by subject. There are just too many message in LKML for anyone to follow all the message. Thanks. Guess I spoke too soon :( Today I found this Jan 8 04:01:40 squid kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 8 04:01:40 squid kernel: ata2.00: cmd 25/00:08:49:ee:e8/00:00:16:00:00/e0 tag 0 cdb 0x0 data 4096 in Jan 8 04:01:40 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 8 04:01:40 squid kernel: ata2: soft resetting port Jan 8 04:01:40 squid kernel: ata2: softreset failed (port busy but CLO unavailable) Jan 8 04:01:40 squid kernel: ata2: softreset failed, retrying in 5 secs Jan 8 04:01:45 squid kernel: ata2: hard resetting port Jan 8 04:01:53 squid kernel: ata2: port is slow to respond, please be patient (Status 0x80) Jan 8 04:02:16 squid kernel: ata2: port failed to respond (30 secs, Status 0x80) Jan 8 04:02:16 squid kernel: ata2: COMRESET failed (device not ready) Jan 8 04:02:16 squid kernel: ata2: hardreset failed, retrying in 5 secs Jan 8 04:02:21 squid kernel: ata2: hard resetting port Jan 8 04:02:21 squid kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 8 04:02:21 squid kernel: ata2.00: configured for UDMA/133 Jan 8 04:02:21 squid kernel: ata2: EH complete Jan 8 04:02:21 squid kernel: SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) Jan 8 04:02:21 squid kernel: sdb: Write Protect is off Jan 8 04:02:21 squid kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA #uptime 10:10:12 up 3 days, 22:48, 1 user, load average: 0.22, 0.19, 0.18 4 am is the lowest load ever, so I don't get it. I've found two differences with older errors SAct is now 0x0 when before was 0x7fff And the cmd/res used to be really long, now it's just one command About heavy loading the seagate, I've tested as suggested on other thread dd if=drive of=/dev/null for all 4 drives simultaneously, on top of usual load, and all was perfect with current kernel (2.6.20-rc3 + blacklist). Don't know what to do to help Thanks. Pablo. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ And now this :( , still running rc3+blacklist without rebooting Jan 9 05:30:36 squid kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 9 05:30:36 squid kernel: ata2.00: cmd c8/00:08:87:83:85/00:00:00:00:00/e2 tag 0 cdb 0x0 data 4096 in Jan 9 05:30:36 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 9 05:30:36 squid kernel: ata2: soft resetting port Jan 9 05:30:36 squid kernel: ata2: softreset failed (port busy but CLO unavailable) Jan 9 05:30:36 squid kernel: ata2: softreset failed, retrying in 5 secs Jan 9 05:30:41 squid kernel: ata2: hard resetting port Jan 9 05:30:49 squid kernel: ata2: port is slow to respond, please be patient (Status 0x80) Jan 9 05:31:12 squid kernel: ata2: port failed to respond (30 secs, Status 0x80) Jan 9 05:31:12 squid kernel: ata2: COMRESET failed (device not ready) Jan 9 05:31:12 squid kernel: ata2: hardreset failed, retrying in 5 secs Jan 9 05:31:17 squid kernel: ata2: hard resetting port Jan 9 05:31:17 squid kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 9 05:31:17 squid kernel: ata2.00: configured for UDMA/133 Jan 9 05:31:17 squid kernel: ata2: EH complete Jan 9 05:31:17 squid kernel: SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) Jan 9 05:31:17 squid kernel: sdb: Write Protect is off Jan 9 05:31:17 squid kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jan 9 05:32:17 squid kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 9 05:32:17 squid kernel: ata2.00: cmd c8/00:08:37:ac:04/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in Jan 9 05:32:17 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 9 05:32:18 squid kernel: ata2: soft resetting port Jan 9
Re: SATA problems
Hello, Pablo. Please apply common hardware debugging method. You know, swap drives. Use separate power supply for disks, swap cables, etc... It seems more like a hardware problem at this point. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: Pablo Sebastian Greco wrote: After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x? I forgot this (even though I implemented it) but you can turn off NCQ by doing the following. # echo 1 > /sys/block/sdX/device/queue_depth Can you put the seagate drive under load to verify that it's the samsung drive's problem not the controller's? Just an off topic question, does anyone know why I get so uneven IRQ handling on 2.6.19-20 and almost perfect on 2.6.20-rc2-mm1? I dunno. You have much better chance of getting a useful answer by asking it on a separate thread with proper subject line. People usualyl screen threads by subject. There are just too many message in LKML for anyone to follow all the message. Thanks. Guess I spoke too soon :( Today I found this Jan 8 04:01:40 squid kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 8 04:01:40 squid kernel: ata2.00: cmd 25/00:08:49:ee:e8/00:00:16:00:00/e0 tag 0 cdb 0x0 data 4096 in Jan 8 04:01:40 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 8 04:01:40 squid kernel: ata2: soft resetting port Jan 8 04:01:40 squid kernel: ata2: softreset failed (port busy but CLO unavailable) Jan 8 04:01:40 squid kernel: ata2: softreset failed, retrying in 5 secs Jan 8 04:01:45 squid kernel: ata2: hard resetting port Jan 8 04:01:53 squid kernel: ata2: port is slow to respond, please be patient (Status 0x80) Jan 8 04:02:16 squid kernel: ata2: port failed to respond (30 secs, Status 0x80) Jan 8 04:02:16 squid kernel: ata2: COMRESET failed (device not ready) Jan 8 04:02:16 squid kernel: ata2: hardreset failed, retrying in 5 secs Jan 8 04:02:21 squid kernel: ata2: hard resetting port Jan 8 04:02:21 squid kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 8 04:02:21 squid kernel: ata2.00: configured for UDMA/133 Jan 8 04:02:21 squid kernel: ata2: EH complete Jan 8 04:02:21 squid kernel: SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) Jan 8 04:02:21 squid kernel: sdb: Write Protect is off Jan 8 04:02:21 squid kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA #uptime 10:10:12 up 3 days, 22:48, 1 user, load average: 0.22, 0.19, 0.18 4 am is the lowest load ever, so I don't get it. I've found two differences with older errors SAct is now 0x0 when before was 0x7fff And the cmd/res used to be really long, now it's just one command About heavy loading the seagate, I've tested as suggested on other thread dd if= of=/dev/null for all 4 drives simultaneously, on top of usual load, and all was perfect with current kernel (2.6.20-rc3 + blacklist). Don't know what to do to help Thanks. Pablo. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: Pablo Sebastian Greco wrote: After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x? I forgot this (even though I implemented it) but you can turn off NCQ by doing the following. # echo 1 /sys/block/sdX/device/queue_depth Can you put the seagate drive under load to verify that it's the samsung drive's problem not the controller's? Just an off topic question, does anyone know why I get so uneven IRQ handling on 2.6.19-20 and almost perfect on 2.6.20-rc2-mm1? I dunno. You have much better chance of getting a useful answer by asking it on a separate thread with proper subject line. People usualyl screen threads by subject. There are just too many message in LKML for anyone to follow all the message. Thanks. Guess I spoke too soon :( Today I found this Jan 8 04:01:40 squid kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 8 04:01:40 squid kernel: ata2.00: cmd 25/00:08:49:ee:e8/00:00:16:00:00/e0 tag 0 cdb 0x0 data 4096 in Jan 8 04:01:40 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 8 04:01:40 squid kernel: ata2: soft resetting port Jan 8 04:01:40 squid kernel: ata2: softreset failed (port busy but CLO unavailable) Jan 8 04:01:40 squid kernel: ata2: softreset failed, retrying in 5 secs Jan 8 04:01:45 squid kernel: ata2: hard resetting port Jan 8 04:01:53 squid kernel: ata2: port is slow to respond, please be patient (Status 0x80) Jan 8 04:02:16 squid kernel: ata2: port failed to respond (30 secs, Status 0x80) Jan 8 04:02:16 squid kernel: ata2: COMRESET failed (device not ready) Jan 8 04:02:16 squid kernel: ata2: hardreset failed, retrying in 5 secs Jan 8 04:02:21 squid kernel: ata2: hard resetting port Jan 8 04:02:21 squid kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 8 04:02:21 squid kernel: ata2.00: configured for UDMA/133 Jan 8 04:02:21 squid kernel: ata2: EH complete Jan 8 04:02:21 squid kernel: SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) Jan 8 04:02:21 squid kernel: sdb: Write Protect is off Jan 8 04:02:21 squid kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA #uptime 10:10:12 up 3 days, 22:48, 1 user, load average: 0.22, 0.19, 0.18 4 am is the lowest load ever, so I don't get it. I've found two differences with older errors SAct is now 0x0 when before was 0x7fff And the cmd/res used to be really long, now it's just one command About heavy loading the seagate, I've tested as suggested on other thread dd if=drive of=/dev/null for all 4 drives simultaneously, on top of usual load, and all was perfect with current kernel (2.6.20-rc3 + blacklist). Don't know what to do to help Thanks. Pablo. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: Pablo Sebastian Greco wrote: After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x? I forgot this (even though I implemented it) but you can turn off NCQ by doing the following. # echo 1 > /sys/block/sdX/device/queue_depth Thanks, I had forgotten this, too :) Added to the libata FAQ: http://linux-ata.org/faq.html Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Pablo Sebastian Greco wrote: > After an uptime of 13:34 under heavy load and no errors, I'm pretty > sure your patch is correct. Is there a way to backport this to 2.6.18.x? I forgot this (even though I implemented it) but you can turn off NCQ by doing the following. # echo 1 > /sys/block/sdX/device/queue_depth Can you put the seagate drive under load to verify that it's the samsung drive's problem not the controller's? > Just an off topic question, does anyone know why I get so uneven IRQ > handling on 2.6.19-20 and almost perfect on 2.6.20-rc2-mm1? I dunno. You have much better chance of getting a useful answer by asking it on a separate thread with proper subject line. People usualyl screen threads by subject. There are just too many message in LKML for anyone to follow all the message. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Pablo Sebastian Greco wrote: After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x? I forgot this (even though I implemented it) but you can turn off NCQ by doing the following. # echo 1 /sys/block/sdX/device/queue_depth Can you put the seagate drive under load to verify that it's the samsung drive's problem not the controller's? Just an off topic question, does anyone know why I get so uneven IRQ handling on 2.6.19-20 and almost perfect on 2.6.20-rc2-mm1? I dunno. You have much better chance of getting a useful answer by asking it on a separate thread with proper subject line. People usualyl screen threads by subject. There are just too many message in LKML for anyone to follow all the message. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: Pablo Sebastian Greco wrote: After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x? I forgot this (even though I implemented it) but you can turn off NCQ by doing the following. # echo 1 /sys/block/sdX/device/queue_depth Thanks, I had forgotten this, too :) Added to the libata FAQ: http://linux-ata.org/faq.html Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Pablo Sebastian Greco wrote: Tejun Heo wrote: Pablo Sebastian Greco wrote: By crash I mean the whole system going down, having to reset the entire machine. I'm sending you 4 files: dmesg: current boot dmesg, just a boot, because no errors appeared after last crash, since the server is out of production right now (errors usually appear under heavy load, and this primarily a transparent proxy for about 1000 simultaneous users) lspci: the way you asked for it messages and messages.1: files where you can see old boots and crashes (even a soft lockup). If there is anything else I can do, let me know. If you need direct access to the server, I can arrange that too. Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away (please post boot dmesg)? The crash/lock is because filesystem code does not cope with IO errors very well. I can't tell why timeouts are occurring in the first place. It seems that only samsung drives are affected (sda2, 3, 4). Hmmm... Please apply the attached patch to 2.6.20-rc3 and test it. Thanks. Here's boot dmesg with 2.6.20-rc3 + blacklist. And you are right about only affecting samsung drives, but since only those drives get all the heavy load, couldn't tell exactly. I'm putting the server in production right now, so I think in a few hours I'll have more info. Thanks. Pablo. After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x? Just an off topic question, does anyone know why I get so uneven IRQ handling on 2.6.19-20 and almost perfect on 2.6.20-rc2-mm1? Thanks for everything. Pablo. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Tejun Heo wrote: Pablo Sebastian Greco wrote: By crash I mean the whole system going down, having to reset the entire machine. I'm sending you 4 files: dmesg: current boot dmesg, just a boot, because no errors appeared after last crash, since the server is out of production right now (errors usually appear under heavy load, and this primarily a transparent proxy for about 1000 simultaneous users) lspci: the way you asked for it messages and messages.1: files where you can see old boots and crashes (even a soft lockup). If there is anything else I can do, let me know. If you need direct access to the server, I can arrange that too. Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away (please post boot dmesg)? The crash/lock is because filesystem code does not cope with IO errors very well. I can't tell why timeouts are occurring in the first place. It seems that only samsung drives are affected (sda2, 3, 4). Hmmm... Please apply the attached patch to 2.6.20-rc3 and test it. Thanks. Here's boot dmesg with 2.6.20-rc3 + blacklist. And you are right about only affecting samsung drives, but since only those drives get all the heavy load, couldn't tell exactly. I'm putting the server in production right now, so I think in a few hours I'll have more info. Thanks. Pablo. dmesg.bz2 Description: Binary data
Re: SATA problems
Tejun Heo wrote: Pablo Sebastian Greco wrote: By crash I mean the whole system going down, having to reset the entire machine. I'm sending you 4 files: dmesg: current boot dmesg, just a boot, because no errors appeared after last crash, since the server is out of production right now (errors usually appear under heavy load, and this primarily a transparent proxy for about 1000 simultaneous users) lspci: the way you asked for it messages and messages.1: files where you can see old boots and crashes (even a soft lockup). If there is anything else I can do, let me know. If you need direct access to the server, I can arrange that too. Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away (please post boot dmesg)? The crash/lock is because filesystem code does not cope with IO errors very well. I can't tell why timeouts are occurring in the first place. It seems that only samsung drives are affected (sda2, 3, 4). Hmmm... Please apply the attached patch to 2.6.20-rc3 and test it. Thanks. Here's boot dmesg with 2.6.20-rc3 + blacklist. And you are right about only affecting samsung drives, but since only those drives get all the heavy load, couldn't tell exactly. I'm putting the server in production right now, so I think in a few hours I'll have more info. Thanks. Pablo. dmesg.bz2 Description: Binary data
Re: SATA problems
Pablo Sebastian Greco wrote: Tejun Heo wrote: Pablo Sebastian Greco wrote: By crash I mean the whole system going down, having to reset the entire machine. I'm sending you 4 files: dmesg: current boot dmesg, just a boot, because no errors appeared after last crash, since the server is out of production right now (errors usually appear under heavy load, and this primarily a transparent proxy for about 1000 simultaneous users) lspci: the way you asked for it messages and messages.1: files where you can see old boots and crashes (even a soft lockup). If there is anything else I can do, let me know. If you need direct access to the server, I can arrange that too. Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away (please post boot dmesg)? The crash/lock is because filesystem code does not cope with IO errors very well. I can't tell why timeouts are occurring in the first place. It seems that only samsung drives are affected (sda2, 3, 4). Hmmm... Please apply the attached patch to 2.6.20-rc3 and test it. Thanks. Here's boot dmesg with 2.6.20-rc3 + blacklist. And you are right about only affecting samsung drives, but since only those drives get all the heavy load, couldn't tell exactly. I'm putting the server in production right now, so I think in a few hours I'll have more info. Thanks. Pablo. After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x? Just an off topic question, does anyone know why I get so uneven IRQ handling on 2.6.19-20 and almost perfect on 2.6.20-rc2-mm1? Thanks for everything. Pablo. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Pablo Sebastian Greco wrote: > By crash I mean the whole system going down, having to reset the entire > machine. > I'm sending you 4 files: > dmesg: current boot dmesg, just a boot, because no errors appeared after > last crash, since the server is out of production right now (errors > usually appear under heavy load, and this primarily a transparent proxy > for about 1000 simultaneous users) > lspci: the way you asked for it > messages and messages.1: files where you can see old boots and crashes > (even a soft lockup). > If there is anything else I can do, let me know. If you need direct > access to the server, I can arrange that too. Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away (please post boot dmesg)? The crash/lock is because filesystem code does not cope with IO errors very well. I can't tell why timeouts are occurring in the first place. It seems that only samsung drives are affected (sda2, 3, 4). Hmmm... Please apply the attached patch to 2.6.20-rc3 and test it. Thanks. -- tejun diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 0d51d13..f8cf349 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3327,6 +3327,8 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { /* NCQ is slow */ { "WDC WD740ADFD-00", NULL, ATA_HORKAGE_NONCQ }, + { "SAMSUNG SP2504C",NULL, ATA_HORKAGE_NONCQ }, + /* Devices with NCQ limits */ /* End Marker */
Re: SATA problems
Pablo Sebastian Greco wrote: > First of all, thanks for everything, and my excuses if I'm doing > anything wrong, this is my first lkml mail, but I've read all the faq, > so should be OK. > This is the machine with the problem: > > Intel ServerBoard S5000VSA > Dual Core Xeon 2.66 (Intel(R) Xeon(TM) CPU 2.66GHz stepping 04) > 4G Kingston > 1 Seagate 80G sata (ST380211AS) (sda) > 3 Samsung 250G sata (SAMSUNG SP2504C) (sdb,c,d) > > Installed distribution is FC6 x86_64 > > I've been getting these messages with distribution and vanilla kernels > > Jan 1 16:29:08 squid kernel: ata4.00: exception Emask 0x0 SAct > 0x7fff SErr 0x0 action 0x2 frozen > Jan 1 16:29:08 squid kernel: ata4.00: cmd > 61/60:00:c9:6d:8e/00:00:0e:00:00/40 tag 0 cdb 0x0 data 49152 out > Jan 1 16:29:08 squid kernel: res > 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) > Jan 1 16:29:08 squid kernel: ata4.00: cmd > 60/08:08:f7:7d:56/00:00:0e:00:00/40 tag 1 cdb 0x0 data 4096 in > Jan 1 16:29:08 squid kernel: res > 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > > Jan 1 16:29:08 squid kernel: ata4: soft resetting port > Jan 1 16:29:08 squid kernel: ata4: softreset failed (port busy but CLO > unavailable) > Jan 1 16:29:08 squid kernel: ata4: softreset failed, retrying in 5 secs > Jan 1 16:29:13 squid kernel: ata4: hard resetting port > Jan 1 16:29:21 squid kernel: ata4: port is slow to respond, please be > patient (Status 0x80) > Jan 1 16:29:43 squid kernel: ata4: port failed to respond (30 secs, > Status 0x80) > Jan 1 16:29:43 squid kernel: ata4: COMRESET failed (device not ready) > Jan 1 16:29:43 squid kernel: ata4: hardreset failed, retrying in 5 secs > Jan 1 16:29:48 squid kernel: ata4: hard resetting port > Jan 1 16:29:49 squid kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 > SControl 300) > Jan 1 16:29:49 squid kernel: ata4.00: configured for UDMA/133 > Jan 1 16:29:49 squid kernel: ata4: EH complete > Jan 1 16:29:49 squid kernel: SCSI device sdd: 488397168 512-byte hdwr > sectors (250059 MB) > Jan 1 16:29:49 squid kernel: sdd: Write Protect is off > Jan 1 16:29:49 squid kernel: SCSI device sdd: write cache: enabled, > read cache: enabled, doesn't support DPO or FUA > > lots of them, and eventually crashing the system. > Tested from fc6 2.6.18 kernel to vanilla 2.6.20-rc2-mm1. Old kernels > just crash, newer ones log these things and then crash. > I don't want to flood with this mail with useless info, so please tell > me what to send and I'll do it (dmesg, smartctl... you name it) > BTW, memtest was running for about 2 days without errors, and and > badblocks on all 4 drives returned nothing. Reallocated_Sector_Ct > raw_value was 0 on all 4 drives Please post full dmesg and the result of 'lspci -nnvvv'. And what do you mean by 'crash'? -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Pablo Sebastian Greco wrote: First of all, thanks for everything, and my excuses if I'm doing anything wrong, this is my first lkml mail, but I've read all the faq, so should be OK. This is the machine with the problem: Intel ServerBoard S5000VSA Dual Core Xeon 2.66 (Intel(R) Xeon(TM) CPU 2.66GHz stepping 04) 4G Kingston 1 Seagate 80G sata (ST380211AS) (sda) 3 Samsung 250G sata (SAMSUNG SP2504C) (sdb,c,d) Installed distribution is FC6 x86_64 I've been getting these messages with distribution and vanilla kernels Jan 1 16:29:08 squid kernel: ata4.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x2 frozen Jan 1 16:29:08 squid kernel: ata4.00: cmd 61/60:00:c9:6d:8e/00:00:0e:00:00/40 tag 0 cdb 0x0 data 49152 out Jan 1 16:29:08 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 1 16:29:08 squid kernel: ata4.00: cmd 60/08:08:f7:7d:56/00:00:0e:00:00/40 tag 1 cdb 0x0 data 4096 in Jan 1 16:29:08 squid kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) snip Jan 1 16:29:08 squid kernel: ata4: soft resetting port Jan 1 16:29:08 squid kernel: ata4: softreset failed (port busy but CLO unavailable) Jan 1 16:29:08 squid kernel: ata4: softreset failed, retrying in 5 secs Jan 1 16:29:13 squid kernel: ata4: hard resetting port Jan 1 16:29:21 squid kernel: ata4: port is slow to respond, please be patient (Status 0x80) Jan 1 16:29:43 squid kernel: ata4: port failed to respond (30 secs, Status 0x80) Jan 1 16:29:43 squid kernel: ata4: COMRESET failed (device not ready) Jan 1 16:29:43 squid kernel: ata4: hardreset failed, retrying in 5 secs Jan 1 16:29:48 squid kernel: ata4: hard resetting port Jan 1 16:29:49 squid kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 1 16:29:49 squid kernel: ata4.00: configured for UDMA/133 Jan 1 16:29:49 squid kernel: ata4: EH complete Jan 1 16:29:49 squid kernel: SCSI device sdd: 488397168 512-byte hdwr sectors (250059 MB) Jan 1 16:29:49 squid kernel: sdd: Write Protect is off Jan 1 16:29:49 squid kernel: SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA lots of them, and eventually crashing the system. Tested from fc6 2.6.18 kernel to vanilla 2.6.20-rc2-mm1. Old kernels just crash, newer ones log these things and then crash. I don't want to flood with this mail with useless info, so please tell me what to send and I'll do it (dmesg, smartctl... you name it) BTW, memtest was running for about 2 days without errors, and and badblocks on all 4 drives returned nothing. Reallocated_Sector_Ct raw_value was 0 on all 4 drives Please post full dmesg and the result of 'lspci -nnvvv'. And what do you mean by 'crash'? -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA problems
Pablo Sebastian Greco wrote: By crash I mean the whole system going down, having to reset the entire machine. I'm sending you 4 files: dmesg: current boot dmesg, just a boot, because no errors appeared after last crash, since the server is out of production right now (errors usually appear under heavy load, and this primarily a transparent proxy for about 1000 simultaneous users) lspci: the way you asked for it messages and messages.1: files where you can see old boots and crashes (even a soft lockup). If there is anything else I can do, let me know. If you need direct access to the server, I can arrange that too. Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away (please post boot dmesg)? The crash/lock is because filesystem code does not cope with IO errors very well. I can't tell why timeouts are occurring in the first place. It seems that only samsung drives are affected (sda2, 3, 4). Hmmm... Please apply the attached patch to 2.6.20-rc3 and test it. Thanks. -- tejun diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 0d51d13..f8cf349 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3327,6 +3327,8 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { /* NCQ is slow */ { WDC WD740ADFD-00, NULL, ATA_HORKAGE_NONCQ }, + { SAMSUNG SP2504C,NULL, ATA_HORKAGE_NONCQ }, + /* Devices with NCQ limits */ /* End Marker */
SATA problems
First of all, thanks for everything, and my excuses if I'm doing anything wrong, this is my first lkml mail, but I've read all the faq, so should be OK. This is the machine with the problem: Intel ServerBoard S5000VSA Dual Core Xeon 2.66 (Intel(R) Xeon(TM) CPU 2.66GHz stepping 04) 4G Kingston 1 Seagate 80G sata (ST380211AS) (sda) 3 Samsung 250G sata (SAMSUNG SP2504C) (sdb,c,d) Installed distribution is FC6 x86_64 I've been getting these messages with distribution and vanilla kernels Jan 1 16:29:08 squid kernel: ata4.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x2 frozen Jan 1 16:29:08 squid kernel: ata4.00: cmd 61/60:00:c9:6d:8e/00:00:0e:00:00/40 tag 0 cdb 0x0 data 49152 out Jan 1 16:29:08 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 1 16:29:08 squid kernel: ata4.00: cmd 60/08:08:f7:7d:56/00:00:0e:00:00/40 tag 1 cdb 0x0 data 4096 in Jan 1 16:29:08 squid kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 1 16:29:08 squid kernel: ata4: soft resetting port Jan 1 16:29:08 squid kernel: ata4: softreset failed (port busy but CLO unavailable) Jan 1 16:29:08 squid kernel: ata4: softreset failed, retrying in 5 secs Jan 1 16:29:13 squid kernel: ata4: hard resetting port Jan 1 16:29:21 squid kernel: ata4: port is slow to respond, please be patient (Status 0x80) Jan 1 16:29:43 squid kernel: ata4: port failed to respond (30 secs, Status 0x80) Jan 1 16:29:43 squid kernel: ata4: COMRESET failed (device not ready) Jan 1 16:29:43 squid kernel: ata4: hardreset failed, retrying in 5 secs Jan 1 16:29:48 squid kernel: ata4: hard resetting port Jan 1 16:29:49 squid kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 1 16:29:49 squid kernel: ata4.00: configured for UDMA/133 Jan 1 16:29:49 squid kernel: ata4: EH complete Jan 1 16:29:49 squid kernel: SCSI device sdd: 488397168 512-byte hdwr sectors (250059 MB) Jan 1 16:29:49 squid kernel: sdd: Write Protect is off Jan 1 16:29:49 squid kernel: SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA lots of them, and eventually crashing the system. Tested from fc6 2.6.18 kernel to vanilla 2.6.20-rc2-mm1. Old kernels just crash, newer ones log these things and then crash. I don't want to flood with this mail with useless info, so please tell me what to send and I'll do it (dmesg, smartctl... you name it) BTW, memtest was running for about 2 days without errors, and and badblocks on all 4 drives returned nothing. Reallocated_Sector_Ct raw_value was 0 on all 4 drives Thanks in advance. Pablo. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SATA problems
First of all, thanks for everything, and my excuses if I'm doing anything wrong, this is my first lkml mail, but I've read all the faq, so should be OK. This is the machine with the problem: Intel ServerBoard S5000VSA Dual Core Xeon 2.66 (Intel(R) Xeon(TM) CPU 2.66GHz stepping 04) 4G Kingston 1 Seagate 80G sata (ST380211AS) (sda) 3 Samsung 250G sata (SAMSUNG SP2504C) (sdb,c,d) Installed distribution is FC6 x86_64 I've been getting these messages with distribution and vanilla kernels Jan 1 16:29:08 squid kernel: ata4.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x2 frozen Jan 1 16:29:08 squid kernel: ata4.00: cmd 61/60:00:c9:6d:8e/00:00:0e:00:00/40 tag 0 cdb 0x0 data 49152 out Jan 1 16:29:08 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 1 16:29:08 squid kernel: ata4.00: cmd 60/08:08:f7:7d:56/00:00:0e:00:00/40 tag 1 cdb 0x0 data 4096 in Jan 1 16:29:08 squid kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) snip Jan 1 16:29:08 squid kernel: ata4: soft resetting port Jan 1 16:29:08 squid kernel: ata4: softreset failed (port busy but CLO unavailable) Jan 1 16:29:08 squid kernel: ata4: softreset failed, retrying in 5 secs Jan 1 16:29:13 squid kernel: ata4: hard resetting port Jan 1 16:29:21 squid kernel: ata4: port is slow to respond, please be patient (Status 0x80) Jan 1 16:29:43 squid kernel: ata4: port failed to respond (30 secs, Status 0x80) Jan 1 16:29:43 squid kernel: ata4: COMRESET failed (device not ready) Jan 1 16:29:43 squid kernel: ata4: hardreset failed, retrying in 5 secs Jan 1 16:29:48 squid kernel: ata4: hard resetting port Jan 1 16:29:49 squid kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 1 16:29:49 squid kernel: ata4.00: configured for UDMA/133 Jan 1 16:29:49 squid kernel: ata4: EH complete Jan 1 16:29:49 squid kernel: SCSI device sdd: 488397168 512-byte hdwr sectors (250059 MB) Jan 1 16:29:49 squid kernel: sdd: Write Protect is off Jan 1 16:29:49 squid kernel: SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA lots of them, and eventually crashing the system. Tested from fc6 2.6.18 kernel to vanilla 2.6.20-rc2-mm1. Old kernels just crash, newer ones log these things and then crash. I don't want to flood with this mail with useless info, so please tell me what to send and I'll do it (dmesg, smartctl... you name it) BTW, memtest was running for about 2 days without errors, and and badblocks on all 4 drives returned nothing. Reallocated_Sector_Ct raw_value was 0 on all 4 drives Thanks in advance. Pablo. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/