Re: SATA timeouts on two disks

2008-01-24 Thread rgheck

Tejun Heo wrote:

Hello,

Jim MacBaine wrote:
  

A co-worker, to whom I explained my problem, asked me whether I had
properly grounded my drives. In fact I had not: The drives resided in
a vibration-absorbing frame through which their exterior had no
electrical contact with the grounded case. Since I grounded the drives
two days ago, I got no new errors.  So maybe my problem is solved.



Hmmm... Grounding. Interesting.

  
Can you say about more about this, Jim? This may also be my problem, or 
part of it, as my drives too are mounted in such a way as not to be in 
physical contact with the case. How did you go about grounding them? I 
suppose one test would be just to remove the washers


That said, in my case, 2.6.24 seems to make a big difference, too. I 
accidentally booted into 2.6.23 today and, boom.


Richard

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-24 Thread Tejun Heo
Hello,

Jim MacBaine wrote:
> A co-worker, to whom I explained my problem, asked me whether I had
> properly grounded my drives. In fact I had not: The drives resided in
> a vibration-absorbing frame through which their exterior had no
> electrical contact with the grounded case. Since I grounded the drives
> two days ago, I got no new errors.  So maybe my problem is solved.

Hmmm... Grounding. Interesting.

> If not, I will happily try out your suggestion. Would you be so kind
> to explain in a few words, what connecting one drive to a second
> (supposedly good) PSU will show?

It's just a good way to isolate problems.  For example, the motherboard
could be doing something strange on the 12v rail and the PSU could be
too sensitive causing the whole rail to fluctuate slightly leading to
occasional transmission errors.  SATA links are the first to be affected
by those kinds of electrical problems.

> (Is this still on-topic on this list?)

Yeah, sure.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-24 Thread Jim MacBaine
Hi,

On Jan 21, 2008 8:47 AM, Tejun Heo <[EMAIL PROTECTED]> wrote:

> If you still have the old PSU lying around, please try to power one of
> the failing drive with the old PSU.  Just leave everything else as-is,
> power-up old PSU by itself as described in the following web page and
> connect only one of the failing drive to the old PSU.
>
>   http://modtown.co.uk/mt/article2.php?id=psumod
>
> And see whether the problem continues and if so on which drives.
> Connecting SATA drives to separate power is completely safe even if they
> don't have common ground because SATA connection never directly connect
> to each other.

Yes, I still have the old PSU lying around.

A co-worker, to whom I explained my problem, asked me whether I had
properly grounded my drives. In fact I had not: The drives resided in
a vibration-absorbing frame through which their exterior had no
electrical contact with the grounded case. Since I grounded the drives
two days ago, I got no new errors.  So maybe my problem is solved.

If not, I will happily try out your suggestion. Would you be so kind
to explain in a few words, what connecting one drive to a second
(supposedly good) PSU will show?

(Is this still on-topic on this list?)

Thanks a lot,
Jim
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-20 Thread Tejun Heo
Jim MacBaine wrote:
> On Jan 13, 2008 1:07 PM, Mikael Pettersson <[EMAIL PROTECTED]> wrote:
> 
>> The fact that the problems occur on different disks on
>> different controllers driven by different drivers indicates
>> that it's not a disk, controller, or driver problem.
>>
>> I strongly suspect an underdimensioned or failing PSU.
> 
> Thanks a lot for your clues.
> 
> I bought a new PSU on Monday and didn't get any new disk failures for
> days.  But last night the same time-outs occurred again on two disks.
> I guess I will try to replace the motherboard including the two SATA
> controllers next.

If you still have the old PSU lying around, please try to power one of
the failing drive with the old PSU.  Just leave everything else as-is,
power-up old PSU by itself as described in the following web page and
connect only one of the failing drive to the old PSU.

  http://modtown.co.uk/mt/article2.php?id=psumod

And see whether the problem continues and if so on which drives.
Connecting SATA drives to separate power is completely safe even if they
don't have common ground because SATA connection never directly connect
to each other.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-19 Thread Jim MacBaine
On Jan 19, 2008 5:50 PM, rgheck <[EMAIL PROTECTED]> wrote:

> I don't know if your problems are similar to mine or not. But I have
> been having extensive problems for quite some time now. Do you get these
> timeouts when using optical drives?

No, I don't see any connections to optical drives here.  I have a DVD
drive and a DVDRW drive on a PATA controller in the failing system but
I have not used them for weeks.

Regards,
Jim
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-19 Thread rgheck

Jim MacBaine wrote:

On Jan 13, 2008 1:07 PM, Mikael Pettersson <[EMAIL PROTECTED]> wrote

The fact that the problems occur on different disks on
different controllers driven by different drivers indicates
that it's not a disk, controller, or driver problem.

I strongly suspect an underdimensioned or failing PSU.


Thanks a lot for your clues.

I bought a new PSU on Monday and didn't get any new disk failures for
days.  But last night the same time-outs occurred again on two disks.
I guess I will try to replace the motherboard including the two SATA
controllers next.
  
I don't know if your problems are similar to mine or not. But I have 
been having extensive problems for quite some time now. Do you get these 
timeouts when using optical drives? That's what seems to trigger it in 
my case: If I'm using the optical drives, I'll often see the errors with 
them first, and then the whole ATA subsystem seems to go down. Then I 
get journal commit errors, general read errors, etc, until the system 
basically locks up. Worst case, it all happens very suddenly, and 
there's not even anything in the logs. Just a couple messages to the 
terminal, usually a journal commit error.


In my case, the opticall drives are a brand new Pioneer DVD-RW on SATA 
and an old Plextor on PATA. I mostly see the errors with the latter but 
have also seen them with the former. I'd thought I'd fixed it by adding 
pnpacpi=off and pci=nomsi,nommconf to the kernel boot options, as well 
as libata noacpi=1 to modules.conf, but now I've just had the problem 
again. I'm now thinking I should try eliminating the Plextor drive. It 
may be that it's the PATA drive that is causing all the trouble. I'll 
report if so.


FYI, here are the relevant modules being loaded:
[EMAIL PROTECTED] rgheck]# lsmod | grep ata
pata_amd   20293  0
pata_pdc2027x  17477  0
sata_nv25157  8
ata_generic14405  0
libata114673  4 pata_amd,pata_pdc2027x,sata_nv,ata_generic
scsi_mod  145657  5 sr_mod,sg,usb_storage,libata,sd_mod
The IDE interface is an nVidia MCP55, apparently, on an ASUS P5N32-E mb.

I doubt very much it's a PS issue in my case. There's not that much in 
the box.


Richard

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-19 Thread Jim MacBaine
On Jan 13, 2008 1:07 PM, Mikael Pettersson <[EMAIL PROTECTED]> wrote:

> The fact that the problems occur on different disks on
> different controllers driven by different drivers indicates
> that it's not a disk, controller, or driver problem.
>
> I strongly suspect an underdimensioned or failing PSU.

Thanks a lot for your clues.

I bought a new PSU on Monday and didn't get any new disk failures for
days.  But last night the same time-outs occurred again on two disks.
I guess I will try to replace the motherboard including the two SATA
controllers next.

Regards,
Jim
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-13 Thread Mikael Pettersson
Jim MacBaine writes:
 > Hi,
 > 
 > Recently I'm experiencing strange sata errors on my desktop system.
 > The system was recently equipped with three 250 GB SATA drives from

Clue #1: added drives

 > three different manufacturers and I'm having an identical problem on
 > two of them.  The drives are connected to two on-board controllers on
 > an Asus A8V board, which were both running with Linux for more than
 > two years with older SATA disks without problems. A hardware failure
 > seems unlikely to me as the same error occurrs on two brand new disks
 > from two different manufacturers.  I'm running a vanilla 2.6.23.12
 > kernel.
 > 
 > Errror on sdc happened about 10 times tonight, each time I could hear
 > the disk spin down and up again, while the system was frozen for
 > several seconds:
 > 
 > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x18 action 0x2 frozen
 > ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
 >  res 40/00:00:00:00:40/00:00:00:00:00/00 Emask 0x4 (timeout)
 > ata2: soft resetting port
 > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 > ata2.00: configured for UDMA/133
 > ata2: EH complete
 > sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
 > sd 1:0:0:0: [sdb] Write Protect is off
 > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 > 
 > In the log I also found several identical errors on one other drive:
 > 
 > ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 > ata5.00: cmd 25/00:08:b7:f2:11/00:00:13:00:00/e0 tag 0 cdb 0x0 data 4096 in
 >  res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
 > ata5: soft resetting port
 > ata5.00: configured for UDMA/33
 > ata5: EH complete
 > sd 4:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
 > sd 4:0:0:0: [sdc] Write Protect is off
 > sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
 > sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA

Clue #2: both ata2 and ata5 are having problems

 > 
 > Can this be the result of a hardware failure?  I've seen several
 > drives being added to an NCQ blacklist during the last weeks.  Is it
 > possible that my drives need to be added here, too?  Or have I just
 > two failing drives?
 > 
 > Thanks a lot for any clues,
 > Jim
 > 
 > 
 > System boot log extract:
 > 
 > sata_promise :00:08.0: version 2.10
 > ACPI: PCI Interrupt :00:08.0[A] -> GSI 18 (level, low) -> IRQ 18
 > scsi0 : sata_promise
 > scsi1 : sata_promise
 > scsi2 : sata_promise
 > ata1: SATA max UDMA/133 cmd 0xf882e200 ctl 0xf882e238 bmdma 0x irq 18
 > ata2: SATA max UDMA/133 cmd 0xf882e280 ctl 0xf882e2b8 bmdma 0x irq 18
 > ata3: PATA max UDMA/133 cmd 0xf882e300 ctl 0xf882e338 bmdma 0x irq 18
 > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 > ata1.00: ATA-8: SAMSUNG HD252KJ, CM100-12, max UDMA7
 > ata1.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
 > ata1.00: configured for UDMA/133
 > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 > ata2.00: ATA-7: WDC WD2500JS-55NCB1, 10.02E01, max UDMA/133
 > ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
 > ata2.00: configured for UDMA/133

Clue #3: ata2 is driven by sata_promise (lspci says it's a 20378, they're good)

 > scsi 0:0:0:0: Direct-Access ATA  SAMSUNG HD252KJ  CM10 PQ: 0 ANSI: 5
 > sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
 > sd 0:0:0:0: [sda] Write Protect is off
 > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 > sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
 > sd 0:0:0:0: [sda] Write Protect is off
 > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 >  sda: sda2 sda3
 > sd 0:0:0:0: [sda] Attached SCSI disk
 > scsi 1:0:0:0: Direct-Access ATA  WDC WD2500JS-55N 10.0 PQ: 0 ANSI: 5
 > sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
 > sd 1:0:0:0: [sdb] Write Protect is off
 > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 > sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
 > sd 1:0:0:0: [sdb] Write Protect is off
 > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
 > support DPO or FUA
 >  sdb: sdb2 sdb3
 > sd 1:0:0:0: [sdb] Attached SCSI disk
 > sata_via :00:0f.0: version 2.3
 > ACPI: PCI Interrupt :00:0f.0[B] -> GSI 20 (level, low) -> IRQ 17
 > sata_via :00:0f.0: routed to hard irq line 10
 > scsi3 : sata_via
 > scsi4 : sata_via
 > ata4: SATA max UDMA/133 cmd 0x0001d000 ctl 0x0001c802 bmdma 0x0001b800 irq 17
 > ata5: SATA max UDMA/133 cmd 0x0001c400 ctl 0x0001c002 bmdma 0x00

SATA timeouts on two disks

2008-01-12 Thread Jim MacBaine
Hi,

Recently I'm experiencing strange sata errors on my desktop system.
The system was recently equipped with three 250 GB SATA drives from
three different manufacturers and I'm having an identical problem on
two of them.  The drives are connected to two on-board controllers on
an Asus A8V board, which were both running with Linux for more than
two years with older SATA disks without problems. A hardware failure
seems unlikely to me as the same error occurrs on two brand new disks
from two different manufacturers.  I'm running a vanilla 2.6.23.12
kernel.

Errror on sdc happened about 10 times tonight, each time I could hear
the disk spin down and up again, while the system was frozen for
several seconds:

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x18 action 0x2 frozen
ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
 res 40/00:00:00:00:40/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2: soft resetting port
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA

In the log I also found several identical errors on one other drive:

ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: cmd 25/00:08:b7:f2:11/00:00:13:00:00/e0 tag 0 cdb 0x0 data 4096 in
 res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5: soft resetting port
ata5.00: configured for UDMA/33
ata5: EH complete
sd 4:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA

Can this be the result of a hardware failure?  I've seen several
drives being added to an NCQ blacklist during the last weeks.  Is it
possible that my drives need to be added here, too?  Or have I just
two failing drives?

Thanks a lot for any clues,
Jim


System boot log extract:

sata_promise :00:08.0: version 2.10
ACPI: PCI Interrupt :00:08.0[A] -> GSI 18 (level, low) -> IRQ 18
scsi0 : sata_promise
scsi1 : sata_promise
scsi2 : sata_promise
ata1: SATA max UDMA/133 cmd 0xf882e200 ctl 0xf882e238 bmdma 0x irq 18
ata2: SATA max UDMA/133 cmd 0xf882e280 ctl 0xf882e2b8 bmdma 0x irq 18
ata3: PATA max UDMA/133 cmd 0xf882e300 ctl 0xf882e338 bmdma 0x irq 18
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-8: SAMSUNG HD252KJ, CM100-12, max UDMA7
ata1.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7: WDC WD2500JS-55NCB1, 10.02E01, max UDMA/133
ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata2.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access ATA  SAMSUNG HD252KJ  CM10 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sda: sda2 sda3
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 1:0:0:0: Direct-Access ATA  WDC WD2500JS-55N 10.0 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdb: sdb2 sdb3
sd 1:0:0:0: [sdb] Attached SCSI disk
sata_via :00:0f.0: version 2.3
ACPI: PCI Interrupt :00:0f.0[B] -> GSI 20 (level, low) -> IRQ 17
sata_via :00:0f.0: routed to hard irq line 10
scsi3 : sata_via
scsi4 : sata_via
ata4: SATA max UDMA/133 cmd 0x0001d000 ctl 0x0001c802 bmdma 0x0001b800 irq 17
ata5: SATA max UDMA/133 cmd 0x0001c400 ctl 0x0001c002 bmdma 0x0001b808 irq 17
ata4: SATA link down 1.5 Gbps (SStatus 0 SControl 300)
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata5.00: ATA-7: MAXTOR STM3250820AS, 3.AAE, max UDMA/133
ata5.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata5.00: configured for UDMA/133
scsi 4:0:0:0: Direct-Access ATA  MAXTOR STM325082 3.AA PQ: 0 ANSI: 5
sd 4:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: