Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-16 Thread Pieter de Boer
Hi Jeremy, SNIP: both old disks were fine Anyway, if heavy disk/controller load appears to be causing these problems, you could have power-related issues. Possibly the combination of two disks + heavy I/O causes enough power draw that the ICH9 starts to behave oddly. Voltages which deviate

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer
Hi Jeremy, Lots to say about all of this. Thanks for your elaborate reply, it was very useful to see smartctl output explained a bit :) I still think there's something else in play beside disk failure. I've checked one of the drives I replaced earlier, but that one doesn't have any of the

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer
Hi Terry, I have a bunch of R300's here. From one that is using the on-board SATA and 2 drives in a gmirror setup (very similar to the OP) after 18 hours of uptime: [0:2] speedtest:~ vmstat -i interrupt total rate irq23: atapci0254116

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer
Hi there, what kind of disk I/O is going on. If actual I/O is very little, then something weird is going on with regards to the number of interrupts being seen on IRQ 23. mav@ might have some ideas, otherwise I'd recommend rebooting the machine and seeing if the number drops. If so, it may

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Terry Kennedy
Interesting. Which version of FreeBSD is this system running? I guess you didn't experience any of the timeouts I'm seeing? 8-STABLE as of the 11th of this month, or thereabouts. No, I've never seen a disk timeout on that box. Yeah, this R300 was bought second-hand and unfortunately the

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Miroslav Lachman
Pieter de Boer wrote: Hi there, what kind of disk I/O is going on. If actual I/O is very little, then something weird is going on with regards to the number of interrupts being seen on IRQ 23. mav@ might have some ideas, otherwise I'd recommend rebooting the machine and seeing if the number

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Jeremy Chadwick
On Sat, May 15, 2010 at 09:04:11AM +0200, Pieter de Boer wrote: Thanks for your elaborate reply, it was very useful to see smartctl output explained a bit :) I still think there's something else in play beside disk failure. I've checked one of the drives I replaced earlier, but that one

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer
Hi, SNIP: disk without errors timing out That could be caused by a multitude of other known things. For example, some Western Digital Green drives (including the Enterprise class ones) are known to perform head parking/offloading excessively, which could result in the drive spending more time

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer
Attached the SMART output of both disks I replaced about a month ago. It appears I replaced perfectly fine drives with the current disks with errors ;( One of the old disks is in a USB-enclosure now, so 'da0'. Let's send those attachments, then. -- Pieter smartctl 5.39 2009-12-09 r2995

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Jeremy Chadwick
On Sat, May 15, 2010 at 11:16:33PM +0200, Pieter de Boer wrote: Attached the SMART output of both disks I replaced about a month ago. It appears I replaced perfectly fine drives with the current disks with errors ;( One of the old disks is in a USB-enclosure now, so 'da0'. Regarding the

Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer
Hi list, I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 SATA controller on-board (do not have the RAID controller). The system has 2 disks in a gmirror setup. Every now and then, probably under some load, one of the disks gets read or write timeouts like: May 5 03:01:37

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Adam Vande More
On Fri, May 14, 2010 at 12:42 PM, Pieter de Boer pie...@os3.nl wrote: I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 SATA controller on-board (do not have the RAID controller). The system has 2 disks in a gmirror setup. Every now and then, probably under some load, one of

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer
Adam Vande More wrote: May 5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command May 5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command May 5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5). ad4[WRITE(offset=200404975104, length=16384)] May 5

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 07:42:33PM +0200, Pieter de Boer wrote: Hi list, I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 SATA controller on-board (do not have the RAID controller). The system has 2 disks in a gmirror setup. Every now and then, probably under some load,

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer
My question: does anyone have experience with FreeBSD on a Dell R300 or can anyone give me some help in trying to fix the timeouts? Could you please do the following: - Provide output from vmstat -i - Provide output from dmesg | grep -i ata - Install ports/sysutils/smartmontools (5.40 or

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 11:09:28PM +0200, Pieter de Boer wrote: The ad4 SMART output is showing errors, as this disk is indeed broken now. It wasn't before and it is a replacement of another disk that wasn't broken either. Grmbl, I now see reallocated sectors on ad6 as well, in the smartctl

RE: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Terry Kennedy
On Fri May 14 22:42:38 UTC 2010, Jeremy Chadwick wrote: Finally, your vmstat -i output: # vmstat -i interrupt total rate irq23: atapci0 371021299 10423 Good to know there's no IRQ sharing going on, but what does worry me is the