Re: SATA timeouts on two disks
Hi, On Jan 21, 2008 8:47 AM, Tejun Heo [EMAIL PROTECTED] wrote: If you still have the old PSU lying around, please try to power one of the failing drive with the old PSU. Just leave everything else as-is, power-up old PSU by itself as described in the following web page and connect only one of the failing drive to the old PSU. http://modtown.co.uk/mt/article2.php?id=psumod And see whether the problem continues and if so on which drives. Connecting SATA drives to separate power is completely safe even if they don't have common ground because SATA connection never directly connect to each other. Yes, I still have the old PSU lying around. A co-worker, to whom I explained my problem, asked me whether I had properly grounded my drives. In fact I had not: The drives resided in a vibration-absorbing frame through which their exterior had no electrical contact with the grounded case. Since I grounded the drives two days ago, I got no new errors. So maybe my problem is solved. If not, I will happily try out your suggestion. Would you be so kind to explain in a few words, what connecting one drive to a second (supposedly good) PSU will show? (Is this still on-topic on this list?) Thanks a lot, Jim - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SATA timeouts on two disks
Hello, Jim MacBaine wrote: A co-worker, to whom I explained my problem, asked me whether I had properly grounded my drives. In fact I had not: The drives resided in a vibration-absorbing frame through which their exterior had no electrical contact with the grounded case. Since I grounded the drives two days ago, I got no new errors. So maybe my problem is solved. Hmmm... Grounding. Interesting. If not, I will happily try out your suggestion. Would you be so kind to explain in a few words, what connecting one drive to a second (supposedly good) PSU will show? It's just a good way to isolate problems. For example, the motherboard could be doing something strange on the 12v rail and the PSU could be too sensitive causing the whole rail to fluctuate slightly leading to occasional transmission errors. SATA links are the first to be affected by those kinds of electrical problems. (Is this still on-topic on this list?) Yeah, sure. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SATA timeouts on two disks
Tejun Heo wrote: Hello, Jim MacBaine wrote: A co-worker, to whom I explained my problem, asked me whether I had properly grounded my drives. In fact I had not: The drives resided in a vibration-absorbing frame through which their exterior had no electrical contact with the grounded case. Since I grounded the drives two days ago, I got no new errors. So maybe my problem is solved. Hmmm... Grounding. Interesting. Can you say about more about this, Jim? This may also be my problem, or part of it, as my drives too are mounted in such a way as not to be in physical contact with the case. How did you go about grounding them? I suppose one test would be just to remove the washers That said, in my case, 2.6.24 seems to make a big difference, too. I accidentally booted into 2.6.23 today and, boom. Richard - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SATA timeouts on two disks
Jim MacBaine wrote: On Jan 13, 2008 1:07 PM, Mikael Pettersson [EMAIL PROTECTED] wrote: The fact that the problems occur on different disks on different controllers driven by different drivers indicates that it's not a disk, controller, or driver problem. I strongly suspect an underdimensioned or failing PSU. Thanks a lot for your clues. I bought a new PSU on Monday and didn't get any new disk failures for days. But last night the same time-outs occurred again on two disks. I guess I will try to replace the motherboard including the two SATA controllers next. If you still have the old PSU lying around, please try to power one of the failing drive with the old PSU. Just leave everything else as-is, power-up old PSU by itself as described in the following web page and connect only one of the failing drive to the old PSU. http://modtown.co.uk/mt/article2.php?id=psumod And see whether the problem continues and if so on which drives. Connecting SATA drives to separate power is completely safe even if they don't have common ground because SATA connection never directly connect to each other. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SATA timeouts on two disks
On Jan 13, 2008 1:07 PM, Mikael Pettersson [EMAIL PROTECTED] wrote: The fact that the problems occur on different disks on different controllers driven by different drivers indicates that it's not a disk, controller, or driver problem. I strongly suspect an underdimensioned or failing PSU. Thanks a lot for your clues. I bought a new PSU on Monday and didn't get any new disk failures for days. But last night the same time-outs occurred again on two disks. I guess I will try to replace the motherboard including the two SATA controllers next. Regards, Jim - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SATA timeouts on two disks
Jim MacBaine wrote: On Jan 13, 2008 1:07 PM, Mikael Pettersson [EMAIL PROTECTED] wrote The fact that the problems occur on different disks on different controllers driven by different drivers indicates that it's not a disk, controller, or driver problem. I strongly suspect an underdimensioned or failing PSU. Thanks a lot for your clues. I bought a new PSU on Monday and didn't get any new disk failures for days. But last night the same time-outs occurred again on two disks. I guess I will try to replace the motherboard including the two SATA controllers next. I don't know if your problems are similar to mine or not. But I have been having extensive problems for quite some time now. Do you get these timeouts when using optical drives? That's what seems to trigger it in my case: If I'm using the optical drives, I'll often see the errors with them first, and then the whole ATA subsystem seems to go down. Then I get journal commit errors, general read errors, etc, until the system basically locks up. Worst case, it all happens very suddenly, and there's not even anything in the logs. Just a couple messages to the terminal, usually a journal commit error. In my case, the opticall drives are a brand new Pioneer DVD-RW on SATA and an old Plextor on PATA. I mostly see the errors with the latter but have also seen them with the former. I'd thought I'd fixed it by adding pnpacpi=off and pci=nomsi,nommconf to the kernel boot options, as well as libata noacpi=1 to modules.conf, but now I've just had the problem again. I'm now thinking I should try eliminating the Plextor drive. It may be that it's the PATA drive that is causing all the trouble. I'll report if so. FYI, here are the relevant modules being loaded: [EMAIL PROTECTED] rgheck]# lsmod | grep ata pata_amd 20293 0 pata_pdc2027x 17477 0 sata_nv25157 8 ata_generic14405 0 libata114673 4 pata_amd,pata_pdc2027x,sata_nv,ata_generic scsi_mod 145657 5 sr_mod,sg,usb_storage,libata,sd_mod The IDE interface is an nVidia MCP55, apparently, on an ASUS P5N32-E mb. I doubt very much it's a PS issue in my case. There's not that much in the box. Richard - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SATA timeouts on two disks
Jim MacBaine writes: Hi, Recently I'm experiencing strange sata errors on my desktop system. The system was recently equipped with three 250 GB SATA drives from Clue #1: added drives three different manufacturers and I'm having an identical problem on two of them. The drives are connected to two on-board controllers on an Asus A8V board, which were both running with Linux for more than two years with older SATA disks without problems. A hardware failure seems unlikely to me as the same error occurrs on two brand new disks from two different manufacturers. I'm running a vanilla 2.6.23.12 kernel. Errror on sdc happened about 10 times tonight, each time I could hear the disk spin down and up again, while the system was frozen for several seconds: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x18 action 0x2 frozen ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 res 40/00:00:00:00:40/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: configured for UDMA/133 ata2: EH complete sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA In the log I also found several identical errors on one other drive: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata5.00: cmd 25/00:08:b7:f2:11/00:00:13:00:00/e0 tag 0 cdb 0x0 data 4096 in res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata5: soft resetting port ata5.00: configured for UDMA/33 ata5: EH complete sd 4:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB) sd 4:0:0:0: [sdc] Write Protect is off sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Clue #2: both ata2 and ata5 are having problems Can this be the result of a hardware failure? I've seen several drives being added to an NCQ blacklist during the last weeks. Is it possible that my drives need to be added here, too? Or have I just two failing drives? Thanks a lot for any clues, Jim System boot log extract: sata_promise :00:08.0: version 2.10 ACPI: PCI Interrupt :00:08.0[A] - GSI 18 (level, low) - IRQ 18 scsi0 : sata_promise scsi1 : sata_promise scsi2 : sata_promise ata1: SATA max UDMA/133 cmd 0xf882e200 ctl 0xf882e238 bmdma 0x irq 18 ata2: SATA max UDMA/133 cmd 0xf882e280 ctl 0xf882e2b8 bmdma 0x irq 18 ata3: PATA max UDMA/133 cmd 0xf882e300 ctl 0xf882e338 bmdma 0x irq 18 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-8: SAMSUNG HD252KJ, CM100-12, max UDMA7 ata1.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32) ata1.00: configured for UDMA/133 ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATA-7: WDC WD2500JS-55NCB1, 10.02E01, max UDMA/133 ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32) ata2.00: configured for UDMA/133 Clue #3: ata2 is driven by sata_promise (lspci says it's a 20378, they're good) scsi 0:0:0:0: Direct-Access ATA SAMSUNG HD252KJ CM10 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda2 sda3 sd 0:0:0:0: [sda] Attached SCSI disk scsi 1:0:0:0: Direct-Access ATA WDC WD2500JS-55N 10.0 PQ: 0 ANSI: 5 sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: sdb2 sdb3 sd 1:0:0:0: [sdb] Attached SCSI disk sata_via :00:0f.0: version 2.3 ACPI: PCI Interrupt :00:0f.0[B] - GSI 20 (level, low) - IRQ 17 sata_via :00:0f.0: routed to hard irq line 10 scsi3 : sata_via scsi4 : sata_via ata4: SATA max UDMA/133 cmd 0x0001d000 ctl 0x0001c802 bmdma 0x0001b800 irq 17 ata5: SATA max UDMA/133 cmd 0x0001c400 ctl 0x0001c002 bmdma 0x0001b808 irq 17 ata4: SATA link down 1.5 Gbps (SStatus 0 SControl 300) ata5: SATA link up 1.5 Gbps