Re: 7.2-RELEASE-p4, IO errors RAID1 failure

2010-06-18 Thread Pieter de Boer

Hi Matthew,


I'm running 7.2-RELEASE-p4 on an i386 HP server (ML G5) in RAID1
configuration. Very recently, I've seen IO errors such as:

ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=20472527

reported and the RAID mirror is now offline.

ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=395032335
ad0: FAILURE - WRITE_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND  LBA=395032335
ar0: WARNING - mirror protection lost. RAID1 array in DEGRADED mode

I had more or less the same timeout issues on my 8.0-RELEASE box on a 
Dell R300 with SATA disks. What I did was raise the ata timeout from 5 
seconds to 20. I did this by patching the kernel code while running, but 
I'm not sure you'd like that approach ;)


In http://www.freebsd.org/cgi/query-pr.cgi?pr=111023 a patch is 
presented that raises the timeouts by patching a few ATA kernel source 
files. This has been committed to RELENG_7 as well, so by upgrading your 
7.2-install to the latest RELENG_7 (or RELENG_8), you'll have that 
timeout fix.


Why ATA commands can take longer than 5 seconds although the disks 
appear to be fine.. wouldn't know ..


--
Pieter



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-16 Thread Pieter de Boer

Hi Jeremy,

SNIP: both old disks were fine

Anyway, if heavy disk/controller load appears to be causing these
problems, you could have power-related issues.  Possibly the combination
of two disks + heavy I/O causes enough power draw that the ICH9 starts
to behave oddly.  Voltages which deviate too much can cause odd things
to happen to hardware.  If you have the time/money, you might try
replacing the PSU in your system to see if there's any improvement; your
BIOS should be able to provide you Hardware Monitoring statistics
(voltages).  Write these down before and after the PSU swap.  You don't
need to go crazy and buy a 1000W PSU or anything, but 450-750W is pretty
normal these days.
As this is a 19 1U box, I'd need to buy a replacement PSU from Dell or 
a reseller. Not too expensive, but I'd like to avoid that.


While looking through the CVSweb of RELENG_8, I found that ATA timeouts 
have been raised in 8 recently. On 
http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting 
and other URLs, like 
http://linux-bsd-sharing.blogspot.com/2009/03/howto-fix-sata-dma-timeout-issues-on.html, 
I found that increasing the timeout might help. So that's what I'll try 
next time it happens again. If that still doesn't work, I can take a 
better look at the voltage levels.


--
Pieter

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Hi Jeremy,


Lots to say about all of this.


Thanks for your elaborate reply, it was very useful to see smartctl 
output explained a bit :) I still think there's something else in play 
beside disk failure. I've checked one of the drives I replaced earlier, 
but that one doesn't have any of the errors in its SMART output you 
described, although it did drop out of the mirror multiple times during 
its lifetime.



The WD Caviar Black drives have a useful feature called TLER -- it's
disabled by default, for reasons which I don't want to get into here --
which can force the drive to internally give up after X seconds (it's
user-selectable) when dealing with such remapping/errors.  The idea is
to keep the drive from being deemed dead from the OS/controller's point
of view.  I believe Seagate, Hitachi, or Samsung (I forget which) have
this feature as well, but it's not called TLER.
I've read about this feature, but didn't have the time to try to get it 
turned on (iirc you'd need a specific Western Digital DOS-based util or 
something).



If you want to find out the exact LBA that has the problem (there may be
more than one), I can step you through performing a selective LBA scan
using SMART, since this model of disk does support such.  It's easy to
do, easy to understand the results, and can be done while the drive is
in operation (though I would recommend trying to keep disk I/O to a
minimum during this test).  Let me know.
At a certain point in time I had read errors from specific LBA's on ad4. 
Using dd I was able to pinpoint those to single sectors. Overwriting 
those sectors with what was on ad6 made them readable again. What is odd 
is that the 'remapped sector' count of ad4 is 0.


Still I'd like to know how do perform such a scan.

  Finally, your vmstat -i output:



# vmstat -i
interrupt  total   rate
irq23: atapci0 371021299  10423


Good to know there's no IRQ sharing going on, but what does worry me is
the interrupt rate (10K interrupts/second).  That seems *extremely*
high, but it also depends on what kind of disk I/O is happening on this
system -- especially since you have 2 disks attached to the same
controller.
The rate is higher than 1 also at idle. During a gmirror sync from 
ad6 to ad4, it's about 10670.



iostat 1, iostat -x 1, or gstat might come in handy to tell you
what kind of disk I/O is going on.  If actual I/O is very little, then
something weird is going on with regards to the number of interrupts
being seen on IRQ 23.  mav@ might have some ideas, otherwise I'd
recommend rebooting the machine and seeing if the number drops.  If so,
it may be that the OS has some sort of bug where a disk timing out or
falling off the bus causes interrupt problems.  (It's too bad you don't
have AHCI on this system.  It handles stuff like this much more
elegantly...)
If mav@ or anyone else doesn't have another insight in the interrupt 
rate, I guess a reboot will at least show if it's persistent or related 
to the errors. I'll try to do a reboot when convenient (probably sunday 
morning or something).


Thanks,
Pieter




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Hi Terry,


I have a bunch of R300's here. From one that is using the on-board SATA
and 2 drives in a gmirror setup (very similar to the OP) after 18 hours
of uptime:

[0:2] speedtest:~ vmstat -i
interrupt  total   rate
irq23: atapci0254116  3
Interesting. Which version of FreeBSD is this system running? I guess 
you didn't experience any of the timeouts I'm seeing?



  I also have another R300 with Dell's SAS 6/iR card (a re-branded LSI
1068-something, seen as mpt by FreeBSD). While Dell only sells that as
part of a package deal with the hot-swap backplane and redundant power
supplies, there's no reason you couldn't pick one up on eBay and add it
yourself. You'll need some sort of breakout cable to get from the big
connector on the SAS 6 to individual SATA ports.
Yeah, this R300 was bought second-hand and unfortunately the owner 
pulled the RAID card out. It's something to consider, getting one of 
those cards. Do you use the RAID-features of the drive and if so, does 
that work well? I'm a bit hesitant to use hardware raid; it would be a 
big plus if the RAID disks could also be used stand-alone if need be 
(which is easy with gmirror because of its metadata being stored in the 
drive's last sector).


Thanks,
Pieter

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Hi there,


what kind of disk I/O is going on.  If actual I/O is very little, then
something weird is going on with regards to the number of interrupts
being seen on IRQ 23.  mav@ might have some ideas, otherwise I'd
recommend rebooting the machine and seeing if the number drops.  If so,
it may be that the OS has some sort of bug where a disk timing out or
falling off the bus causes interrupt problems.  (It's too bad you don't
have AHCI on this system.  It handles stuff like this much more
elegantly...)
Well, due to a UFS snapshot panic the box was rebooted, and now I only 
see around 1500 interrupts per second, while syncing the mirror.


--
Pieter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Hi,

SNIP: disk without errors timing out

That could be caused by a multitude of other known things.  For
example, some Western Digital Green drives (including the
Enterprise class ones) are known to perform head parking/offloading
excessively, which could result in the drive spending more time doing
that than actually serving overall I/O requests.  There are some
other reports of Samsung Spinpoint drives experiencing other issues
(I've since forgotten and would have to dig up the threads).



If you could provide full SMART stats for that drive, it might help.

Attached the SMART output of both disks I replaced about a month ago. It
appears I replaced perfectly fine drives with the current disks with
errors ;(  One of the old disks is in a USB-enclosure now, so 'da0'.

SNIP: enabling TLER

Yes, it's a DOS-based utility (like most firmware upgraders these
days). I can provide it if you'd like.  I've been meaning to spend
some time trying to reverse-engineer the binary to figure out what
ATA commands it sends to the disk to toggle/adjust the feature (so
that one could do it in real-time rather than have to boot into DOS).


I'd like to try that tool. Since the old WD disks are now lying around
at home, I have some time to get a DOS boot working to try it out. A
FreeBSD-implementation of the WD tool and possibly other brands would be
really useful indeed.


At a certain point in time I had read errors from specific LBA's on
 ad4. Using dd I was able to pinpoint those to single sectors.


This isn't very effective (dd will read large chunks/amounts of data 
(read: multiple LBAs) from the underlying disk at once, rather than

the disk itself performing a per-LBA test).  My opinion is that the
dd method should only be used on drives which don't support
selective LBA scanning via SMART.

Will dd read multiple LBAs even when using 'bs=512'? The process I used
was reading using bs=8192, then zooming in on the LBA's mentioned in
the errors in dmesg with bs=512 to find the actual LBA.

A selective scan on ad4 did not reveal any errors today: it 'completed 
without error'. On ad6 it's a whole lot slower; at the time of writing 
it's at 2/3.



All HD vendors have their own quirks/ordeals right now.  You
basically just have to go with one who works wells for you, then if
things start going downhill, switch to another.  None of them are
perfect.
I figured as much. What irritates though is that I've had consistent 
problems with 4 disks in this specific system, but not (such) issues 
with any other disk in other systems I've had. I generally replace disks 
when I grow out of them, not because they break down.



What this indicates to me is that if a disk falls off the bus on an
ICH9 controller in Enhanced (non-AHCI) mode, FreeBSD starts seeing an
absurd number of interrupts generated from the ICH9.  My guess is
FreeBSD isn't doing something correctly with the controller when this
happens; maybe certain commands aren't being sent back to the
controller or handling of certain events are being done improperly
when it comes to ICH9 (or possibly earlier ICH revisions too).  This
should be *very* easy to reproduce.


Unfortunately I'm not really in a position to help reproducing this or 
testing possible fixes; downtime is currently very unwelcome. Although 
one of the previous disks indeed fell of the bus entirely (couldn't get 
it back with atacontrol either), that hasn't happened again so far. I 
only see timeouts (and a few days ago read errors on ad4) which gmirror 
doesn't like. I guess those aren't that simple to reproduce (apart from 
on my system ;).



If you see any of your disks on the ICH9 controller fall off the bus
or report ATA errors (doesn't matter what kind), please make note of
the timestamp (should be in the kernel log), and ASAP run smartctl
-a on the disk.  You should compare attributes before and after the
event.
You might also want to consider using smartd, which can log SMART 
attribute changes on its own.  Note that you might have to tune the 
arguments in smartd.conf to ignore some attributes which fluctuate 
naturally (such as drive temperature and seek error rate).


I've configured smartd to poll both disks every 5 minutes. I -think- the 
issues happen specifically under load: the periodic scripts of the host 
and its 4 jails appear to trigger it sometimes. At that time I'm 
normally trying to get some sleep, so smartd will have to do for now. 
Although I'll run a smartctl -a asap anyway.


--
Pieter




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-15 Thread Pieter de Boer

Attached the SMART output of both disks I replaced about a month ago. It
appears I replaced perfectly fine drives with the current disks with
errors ;(  One of the old disks is in a USB-enclosure now, so 'da0'.


Let's send those attachments, then.

--
Pieter
smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE i386] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital RE3 Serial ATA family
Device Model: WDC WD5002ABYS-18B1B0
Serial Number:WD-WMASY5474089
Firmware Version: 02.03B03
User Capacity:500,107,862,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Sat May 15 21:53:04 2010 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (9480) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:( 112) minutes.
Conveyance self-test routine
recommended polling time:(   5) minutes.
SCT capabilities:  (0x303f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f   200   200   051Pre-fail  Always   
-   0
  3 Spin_Up_Time0x0027   179   179   021Pre-fail  Always   
-   4033
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   89
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x002e   200   200   000Old_age   Always   
-   0
  9 Power_On_Hours  0x0032   093   093   000Old_age   Always   
-   5536
 10 Spin_Retry_Count0x0032   100   253   000Old_age   Always   
-   0
 11 Calibration_Retry_Count 0x0032   100   253   000Old_age   Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   74
192 Power-Off_Retract_Count 0x0032   200   200   000Old_age   Always   
-   71
193 Load_Cycle_Count0x0032   200   200   000Old_age   Always   
-   89
194 Temperature_Celsius 0x0022   100   094   000Old_age   Always   
-   47
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0032   200   200   000Old_age   Always   
-   0
198 Offline_Uncorrectable   0x0030   200   200   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x0032   200   200   000Old_age   Always   
-   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000Old_age   Offline  
-   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
LBA_of_first_error
# 1  Extended offlineCompleted without error   00%  5487 -
# 2  Extended offlineCompleted without error   00%  

Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer

Hi list,

I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 SATA 
controller on-board (do not have the RAID controller).


The system has 2 disks in a gmirror setup. Every now and then, probably 
under some load, one of the disks gets read or write timeouts like:

May  5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command
May  5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5). 
ad4[WRITE(offset=200404975104, length=16384)]
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider ad4 
disconnected.


or:

May 13 14:41:26 aberdeen kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 
retry left) LBA=975513887


Sometimes the read/write succeeds after a few retries, but sometimes it 
does not, so geom_mirror throws the disk out of the mirror.


Tonight ad6 was thrown out of the mirror and ad4 then gave actual read 
errors, resulting in a big mess :(


My question: does anyone have experience with FreeBSD on a Dell R300 or 
can anyone give me some help in trying to fix the timeouts?


I was told using AHCI could be better for SATA disks, but apparently 
(http://permalink.gmane.org/gmane.linux.kernel.pci/8267) the BIOS does 
not support turning that on, so that does not appear to be an option.


Thanks,
Pieter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer

Adam Vande More wrote:


May  5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command
May  5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5).
ad4[WRITE(offset=200404975104, length=16384)]
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider ad4
disconnected.



Have you tried replacing/checking the cables?  Does it always happen to ad4?
 Your drive could be dying, try swapping it out and see if the errors
persist.

It happens to both drives and to both drives I replaced a month ago with 
these. Didn't replace the cables back then, but they were correctly 
attached and are now. Also it would be odd that both cables are broken 
at the same time.


--
Pieter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer



My question: does anyone have experience with FreeBSD on a Dell R300
or can anyone give me some help in trying to fix the timeouts?


Could you please do the following:

- Provide output from vmstat -i

- Provide output from dmesg | grep -i ata

- Install ports/sysutils/smartmontools (5.40 or later) and provide
  full output from commands smartctl -a /dev/ad4 and smartctl -a
  /dev/ad6


The ad4 SMART output is showing errors, as this disk is indeed broken 
now. It wasn't before and it is a replacement of another disk that 
wasn't broken either. Grmbl, I now see reallocated sectors on ad6 as 
well, in the smartctl output. So both disks look wonky; although afaik 
that's not the main issue here.


I've attached the smartctl output as separate files. smartmontools 5.40 
does not appear to exist; I used 5.39.1, the latest port version.


Attached also the vmstat -i and dmesg output.

--
Pieter
smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.0-RELEASE-p1 i386] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Black family
Device Model: WDC WD5001AALS-00L3B2
Serial Number:WD-WCASYA964063
Firmware Version: 01.03B01
User Capacity:500,107,862,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Fri May 14 23:01:49 2010 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85) Offline data collection activity
was aborted by an interrupting command 
from host.
Auto Offline Data Collection: Enabled.
Self-test execution status:  ( 241) Self-test routine in progress...
10% of test remaining.
Total time to complete Offline 
data collection: (11160) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:( 131) minutes.
Conveyance self-test routine
recommended polling time:(   5) minutes.
SCT capabilities:  (0x3037) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f   200   200   051Pre-fail  Always   
-   78
  3 Spin_Up_Time0x0027   184   168   021Pre-fail  Always   
-   3791
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   992
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x002e   200   200   000Old_age   Always   
-   0
  9 Power_On_Hours  0x0032   099   099   000Old_age   Always   
-   827
 10 Spin_Retry_Count0x0032   100   100   000Old_age   Always   
-   0
 11 Calibration_Retry_Count 0x0032   100   100   000Old_age   Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   990
192 Power-Off_Retract_Count 0x0032   199   199   000Old_age   Always   
-   989
193 Load_Cycle_Count0x0032   200   200   000Old_age   Always   
-   992
194 Temperature_Celsius 0x0022   125   109   000Old_age   Always   
-   22
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0032   200   198   000Old_age   Always   
-   0
198