On Fri, Dec 2, 2011 at 02:58, Jim Klimov <jimkli...@cos.ru> wrote:
> My question still stands: is it possible to recover
> from this error or somehow safely ignore it? ;)
> I mean, without backing up data and recreating the
> pool?
> If the problem is in metadata but presumably the
> pool still works, then this particular metadata
> is either not critical or redundant, and somehow
> can be forged and replaced by valid metadata.
> Is this a rightful path of thought?
> Are there any tools to remake such a metadata
> block?
> Again, I did not try to export/reimport the pool
> yet, except for that time 3 days ago when the
> machine hung, was reset and imported the pool
> and continued the scrub automatically...
> I think it is now too late to do an export and
> a rollback import, too...

Unfortunately I cannot provide you with a direct answer as I have only
been a user of ZFS for about a year and in that time only encountered
this once.

Anecdotally, at work I had something similar happen to a Nexcenta Core
3.0 (b134) box three days ago (seemingly caused by a hang then
eventual panic as a result of attempting to add a drive that is having
read failures to the pool).  When the box came back up, zfs reported
an error in metadata:0x0.  We scrubbed the tank (~400GB used) and like
in your case the checksum error didn't clear.  We ran a scrub again
and it seems that the second scrub did clear the metadata error.

I don't know if that means it will work that way for everyone, every
time, or not.  But considering that the pool and the data on it
appears to be fine (just not having any replicas until we get the bad
disk replaced) and that all metadata is supposed to have <copies>+1
copies (with an apparent max of 3 copies[1]) on the pool at all times
I can't see why this error shouldn't be cleared by a scrub.

[1] http://blogs.oracle.com/relling/entry/zfs_copies_and_data_protection
zfs-discuss mailing list

Reply via email to