Another update to my post: it took a week to try running
some ZDB walks on my pool, but they coredumped after a while.

However, I've also noticed some clues in my FMADM outputs
dating from 'zpool scrub' attempts. There are several sets
(one set per scrub) of similar error reports, differing only
in timestamps, "ena" and "__tod" fields, as far as I could see.

I hope someone can glance over them and point me in the right
direction - i.e. what on-disk block I might want to extract
for analysis and/or forge-and-replace, to fix my pool into
a condition where no errors are reported...

As a reminder, this is a 6-disk raidz2 pool with ashift=12.
During a recent scrub there were 0 on-disk errors, with
one pool-level error and two vdev-level errors. The pool
and vdevs are considered online, and there are no errors
noticed during pool usage (however I didn't intentionally
write anything to it afterwards, it was only RW mounted
for a few bootups), but a metadata error is being reported:

NAME       STATE READ WRITE CKSUM
pool       ONLINE 0 0 1
  raidz2-0 ONLINE 0 0 2
    c7t0d0 ONLINE 0 0 0
    c7t1d0 ONLINE 0 0 0
    c7t2d0 ONLINE 0 0 0
    c7t3d0 ONLINE 0 0 0
    c7t4d0 ONLINE 0 0 0
    c7t5d0 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
<metadata>:<0x0>

Below goes one set of such FMADM reports. There are two
different zio_offset values, 4 lines of 0x6ecb164000 and
8 lines of 0x6ecb163000, both sized 0x8000. I believe
this might be the address of the mismatching block(s);
but now - how do I locate them on-disk to try and
match/analyze/forge/etc?

The offset is probably relative to the pool? Or to the
disks? Any ideas? ;)

Dec 01 2011 08:53:29.437777177 ereport.fs.zfs.data
nvlist version: 0
        class = ereport.fs.zfs.data
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        zio_err = 50
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17f319

Dec 01 2011 08:53:29.437774722 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0x6abe377b60fc48e5
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0x6abe377b60fc48e5
        vdev_type = disk
        vdev_path = /dev/dsk/c7t1d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD1XWWB/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb164000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17e982

Dec 01 2011 08:53:29.437774091 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0xa5d72d5c3c698a85
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0xa5d72d5c3c698a85
        vdev_type = disk
        vdev_path = /dev/dsk/c7t0d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD217ZL/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb164000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17e70b

Dec 01 2011 08:53:29.437772910 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0x395cf609609d8846
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0x395cf609609d8846
        vdev_type = disk
        vdev_path = /dev/dsk/c7t5d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD24GDG/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb163000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17e26e

Dec 01 2011 08:53:29.437771730 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0xea097673c89f08b7
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0xea097673c89f08b7
        vdev_type = disk
        vdev_path = /dev/dsk/c7t4d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD24GCA/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb163000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17ddd2

Dec 01 2011 08:53:29.437771536 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0xcfabcd4285b36454
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0xcfabcd4285b36454
        vdev_type = disk
        vdev_path = /dev/dsk/c7t3d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD21QZL/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb163000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17dd10

Dec 01 2011 08:53:29.437771711 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0xbb193606ede4faca
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0xbb193606ede4faca
        vdev_type = disk
        vdev_path = /dev/dsk/c7t2d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD1VLKC/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb163000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17ddbf

Dec 01 2011 08:53:29.437770735 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0x6abe377b60fc48e5
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0x6abe377b60fc48e5
        vdev_type = disk
        vdev_path = /dev/dsk/c7t1d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD1XWWB/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb164000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17d9ef

Dec 01 2011 08:53:29.437770452 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0xa5d72d5c3c698a85
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0xa5d72d5c3c698a85
        vdev_type = disk
        vdev_path = /dev/dsk/c7t0d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD217ZL/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb164000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17d8d4

Dec 01 2011 08:53:29.437769241 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0x395cf609609d8846
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0x395cf609609d8846
        vdev_type = disk
        vdev_path = /dev/dsk/c7t5d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD24GDG/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb163000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17d419

Dec 01 2011 08:53:29.437768657 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0xea097673c89f08b7
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0xea097673c89f08b7
        vdev_type = disk
        vdev_path = /dev/dsk/c7t4d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD24GCA/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb163000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17d1d1

Dec 01 2011 08:53:29.437769152 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0xcfabcd4285b36454
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0xcfabcd4285b36454
        vdev_type = disk
        vdev_path = /dev/dsk/c7t3d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD21QZL/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb163000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17d3c0

Dec 01 2011 08:53:29.437768480 ereport.fs.zfs.checksum
nvlist version: 0
        class = ereport.fs.zfs.checksum
        ena = 0x7a43652cd2c00401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x1638b976389d447c
                vdev = 0xbb193606ede4faca
        (end detector)

        pool = pool
        pool_guid = 0x1638b976389d447c
        pool_context = 0
        pool_failmode = continue
        vdev_guid = 0xbb193606ede4faca
        vdev_type = disk
        vdev_path = /dev/dsk/c7t2d0s0
        vdev_devid = id1,sd@SATA_____ST2000DL003-9VT1____________5YD1VLKC/a
        parent_guid = 0x53d15735fa4c6d21
        parent_type = raidz
        zio_err = 50
        zio_offset = 0x6ecb163000
        zio_size = 0x8000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0
        zio_blkid = 0x0
        __ttl = 0x1
        __tod = 0x4ed70849 0x1a17d120



2011-12-02 13:58, Jim Klimov wrote:
An intermediate update to my recent post:

2011-11-30 21:01, Jim Klimov wrote:
Hello experts,

I've finally upgraded my troublesome oi-148a home storage box to
oi-151a about a week ago (using pkg update method from the wiki page -
i'm not certain if that repository is fixed at release version or is a
sliding "current" one).

After the OS upgrade i scrubbed my main pool - 6disk raidz2 - and some
checksum errors were discovered on individual disks, with one
non-correctable error on the raid level. It named a file which was
indeed not readable (io errors) so i deleted it. The dataset
pool/media has no snapshots, and dedup was disabled on it, so i hoped
the error is gone.

I cleared the errors (this only zeroed the counters, but still
complained that there were some metadata errors in pool/media:0x4) and
reran the scrub. While the scrub was running, zpool status reported
this error and metadata:0x0. The computer got hung and reset during
the scrub, but apparently resumed from the same spot. When the
operation completed, however, it had zero checksum errors at both disk
and raid levels, the pool/media error was gone, but metadata:0x0 error
is still in place.

Searching the list archive i found a similar post relevant to snv134
and 135, and at that time Victor Latushkin suggested that the pool
must be recreated. I have some unique data on the pool, so i'm
reluctant to recreate it (besides, it's problematic to back up 10tb of
data at home, and it can take weeks to try and upload it to my work -
even if there were so much free space there, which is not).

So far i cleared the errors and started a new scrub. I kinda hope that
if the box won't hang, it might discover that there are no actual
errors indeed. I'll see that in about 100 hours. The pool is now
imported and automounted, and i didn't yet try to export and reimport it.


The scrub is running slower this time, for a couple of days
now and only nearing 25% completion (last timings were 89
and 101 hours). However it seems to have confirmed some
raidz-/pool-level checksum errors (without known individual
disk errors); whar puzzles me more - there are 2 raidz-level
errors for the one pool-level error:

# zpool status -v
pool: pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub in progress since Wed Nov 30 19:38:47 2011
1.97T scanned out of 8.34T at 13.6M/s, 135h54m to go
0 repaired, 23.68% done
config:

NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 1
raidz2-0 ONLINE 0 0 2
c7t0d0 ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0
c7t3d0 ONLINE 0 0 0
c7t4d0 ONLINE 0 0 0
c7t5d0 ONLINE 0 0 0
cache
c4t1d0p7 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

<metadata>:<0x0>



My question still stands: is it possible to recover
from this error or somehow safely ignore it? ;)
I mean, without backing up data and recreating the
pool?

If the problem is in metadata but presumably the
pool still works, then this particular metadata
is either not critical or redundant, and somehow
can be forged and replaced by valid metadata.
Is this a rightful path of thought?

Are there any tools to remake such a metadata
block?

Again, I did not try to export/reimport the pool
yet, except for that time 3 days ago when the
machine hung, was reset and imported the pool
and continued the scrub automatically...

I think it is now too late to do an export and
a rollback import, too...

Still, i'd like to estimate now what are my chances of living on
without recreating the pool nor losing data? Perhaps, some ways to
actually check, fix or forge the needed metadata? Also, previously a
zdb walk found some inconsistencies (allocated !- referred); can that
be better diagnosed or repaired? Can this discrepancy by a few sectors
worth of size be a cause or be caused by that reported metadata error?
Thanks,
// Jim Klimov

sent from a mobile, pardon any typos ,)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to