Re: [zfs-discuss] Mysterious corruption with raidz2 vdev (1 checksum err on disk, 2 on vdev?)
On 28 July, 2007 - Marc Bevand sent me these 0,7K bytes: Matthew Ahrens Matthew.Ahrens at sun.com writes: So the errors on the raidz2 vdev indeed indicate that at least 3 disks below it gave the wrong data for a those 2 blocks; we just couldn't tell which 3+ disks they were. Something must be seriously wrong with this server. This is the first time I see an uncorrectable checksum error in a raidz2 vdev. I would suggest Kevin to run memtest86 or similar. It is more likely bad data has been written on the disks in the first place (due to flaky RAM/CPU/mobo/cables) rather than 3+ disks corrupting data in the same stripe ! They are all connected to the same controller.. which might have had a bad day.. but memory corruption sounds like a plausible problem too.. My workstation suddenly started having trouble compiling hello world.. memtest to the rescue, the next day I found 340 errors.. /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mysterious corruption with raidz2 vdev (1 checksum err on disk, 2 on vdev?)
Kevin wrote: After a scrub of a pool with 3 raidz2 vdevs (each with 5 disks in them) I see the following status output. Notice that the raidz2 vdev has 2 checksum errors, but only one disk inside the raidz2 vdev has a checksum error. How is this possible? I thought that you would have to have 3 errors in the same 'stripe' within a raidz2 vdev in order for the error to become unrecoverable. A checksum error on a disk indicates that we know for sure that this disk gave us wrong data. With raidz[2], if we are unable to reconstruct the block successfully but no disk admitted that it failed, then we have no way of knowing which disk(s) are actually incorrect. So the errors on the raidz2 vdev indeed indicate that at least 3 disks below it gave the wrong data for a those 2 blocks; we just couldn't tell which 3+ disks they were. It's as if I know that A+B==3, but A is 1 and B is 3. I can't tell if A is wrong or B is wrong (or both!). The checksum errors on the cXtXdX vdevs didn't result in data loss, because we reconstructed the data from the other disks in the raidz group. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mysterious corruption with raidz2 vdev (1 checksum err on disk, 2 on vdev?)
Matthew Ahrens Matthew.Ahrens at sun.com writes: So the errors on the raidz2 vdev indeed indicate that at least 3 disks below it gave the wrong data for a those 2 blocks; we just couldn't tell which 3+ disks they were. Something must be seriously wrong with this server. This is the first time I see an uncorrectable checksum error in a raidz2 vdev. I would suggest Kevin to run memtest86 or similar. It is more likely bad data has been written on the disks in the first place (due to flaky RAM/CPU/mobo/cables) rather than 3+ disks corrupting data in the same stripe ! -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Mysterious corruption with raidz2 vdev (1 checksum err on disk, 2 on vdev?)
After a scrub of a pool with 3 raidz2 vdevs (each with 5 disks in them) I see the following status output. Notice that the raidz2 vdev has 2 checksum errors, but only one disk inside the raidz2 vdev has a checksum error. How is this possible? I thought that you would have to have 3 errors in the same 'stripe' within a raidz2 vdev in order for the error to become unrecoverable. And I have not reset any errors with zpool clear ... Comments will be appreciated. Thanks. $ zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed with 1 errors on Mon Jul 23 19:59:07 2007 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 2 raidz2 ONLINE 0 0 2 c2t0d0 ONLINE 0 0 1 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 1 c2t13d0 ONLINE 0 0 0 c2t14d0 ONLINE 0 0 0 spares c2t15d0AVAIL errors: The following persistent errors have been detected: DATASET OBJECT RANGE 55fe9784 lvl=0 blkid=40299 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss