So i figured out after a couple of scrubs and fmadm faulty that drive
c9t15d0 was bad.
I then replaced the drive using
-bash-3.2$ pfexec /usr/sbin/zpool offline vdipool c9t15d0
-bash-3.2$ pfexec /usr/sbin/zpool replace vdipool c9t15d0 c9t19d0
The drive resilvered and I rebooted the server, just to make sure
everything was clean.
After the reboot, zfs resilvred the same drive again(which took 7hrs)
My pool now looks like this:
NAME STATE READ WRITE CKSUM
vdipool DEGRADED 0 0 2
raidz1 DEGRADED 0 0 4
c9t14d0 ONLINE 0 0 1 512 resilvered
spare DEGRADED 0 0 0
c9t15d0 OFFLINE 0 0 0
c9t19d0 ONLINE 0 0 0 16.1G resilvered
c9t16d0 ONLINE 0 0 1 512 resilvered
c9t17d0 ONLINE 0 0 5 2.50K resilvered
c9t18d0 ONLINE 0 0 1 512 resilvered
spares
c9t19d0 INUSE currently in use
I'm going to replace c9t15d0 with a new drive.
I find it odd that zfs needed to resilver the drive after the reboot.
Shouldn't the resilvered information be kept across reboots?
Thanks
Karl
On 04/15/2011 03:55 PM, Cindy Swearingen wrote:
Yes, the Solaris 10 9/10 release has the fix for RAIDZ checksum errors
if you have ruled out any hardware problems.
cs
On 04/15/11 14:47, Karl Rossing wrote:
Would moving the pool to a Solaris 10U9 server fix the random RAIDZ
errors?
On 04/15/2011 02:23 PM, Cindy Swearingen wrote:
D'oh. One more thing.
We had a problem in b120-123 that caused random checksum errors on
RAIDZ configs. This info is still in the ZFS troubleshooting guide.
See if a zpool clear resolves these errors. If that works, then I would
upgrade to a more recent build and see if the problem is resolved
completely.
If not, then see the recommendation below.
Thanks,
Cindy
On 04/15/11 13:18, Cindy Swearingen wrote:
Hi Karl...
I just saw this same condition on another list. I think the poster
resolved it by replacing the HBA.
Drives go bad but they generally don't all go bad at once, so I would
suspect some common denominator like the HBA/controller, cables, and
so on.
See what FMA thinks by running fmdump like this:
# fmdump
TIME UUID SUNW-MSG-ID
Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3
Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3
Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3
This output will tell you when the problem started.
Depending on what fmdump says, which probably indicates multiple drive
problems, I would run diagnostics on the HBA or get it replaced.
Always have good backups.
Thanks,
Cindy
On 04/15/11 12:52, Karl Rossing wrote:
Hi,
One of our zfs volumes seems to be having some errors. So I ran
zpool scrub and it's currently showing the following.
-bash-3.2$ pfexec /usr/sbin/zpool status -x
pool: vdipool
state: ONLINE
status: One or more devices has experienced an unrecoverable
error. An
attempt was made to correct the error. Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear
the errors
using 'zpool clear' or replace the device with 'zpool
replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go
config:
NAME STATE READ WRITE CKSUM
vdipool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c9t14d0 ONLINE 0 0 12 6K repaired
c9t15d0 ONLINE 0 0 13 167K repaired
c9t16d0 ONLINE 0 0 11 5.50K repaired
c9t17d0 ONLINE 0 0 20 10K repaired
c9t18d0 ONLINE 0 0 15 7.50K repaired
spares
c9t19d0 AVAIL
errors: No known data errors
I have another server connected to the same jbod using drives
c8t1d0 to c8t13d0 and it doesn't seem to have any errors.
I'm wondering how it could have gotten so screwed up?
Karl
CONFIDENTIALITY NOTICE: This communication (including all
attachments) is
confidential and is intended for the use of the named addressee(s)
only and
may contain information that is private, confidential, privileged,
and
exempt from disclosure under law. All rights to privilege are
expressly
claimed and reserved and are not waived. Any use, dissemination,
distribution, copying or disclosure of this message and any
attachments, in
whole or in part, by anyone other than the intended recipient(s)
is strictly
prohibited. If you have received this communication in error,
please notify
the sender immediately, delete this communication from all data
storage
devices and destroy all hard copies.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
CONFIDENTIALITY NOTICE: This communication (including all
attachments) is
confidential and is intended for the use of the named addressee(s)
only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law. All rights to privilege are expressly
claimed and reserved and are not waived. Any use, dissemination,
distribution, copying or disclosure of this message and any
attachments, in
whole or in part, by anyone other than the intended recipient(s) is
strictly
prohibited. If you have received this communication in error, please
notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
CONFIDENTIALITY NOTICE: This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law. All rights to privilege are expressly
claimed and reserved and are not waived. Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited. If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss