Re: [zfs-discuss] Split responsibility for data with ZFS

Ross Smith Mon, 15 Dec 2008 02:59:47 -0800

Forgive me for not understanding the details, but couldn't you also
work backwards through the blocks with ZFS and attempt to recreate the
uberblock?

So if you lost the uberblock, could you (memory and time allowing)
start scanning the disk, looking for orphan blocks that aren't
refernced anywhere else and piece together the top of the tree?

Or roll back to a previous uberblock (or a snapshot uberblock), and
then look to see what blocks are on the disk but not referenced
anywhere.  Is there any way to intelligently work out where those
blocks would be linked by looking at how they interact with the known
data?

Of course, rolling back to a previous uberblock would still be a
massive step forward, and something I think would do much to improve
the perception of ZFS as a tool to reliably store data.

You cannot understate the difference to the end user between a file
system that on boot says:
"Sorry, can't read your data pool."

With one that says:
"Whoops, the uberblock, and all the backups are borked.  Would you
like to roll back to a backup uberblock, or leave the filesystem
offline to repair manually?"

As much as anything else, a simple statement explaining *why* a pool
is inaccessible, and saying just how badly things have gone wrong
helps tons.  Being able to recover anything after that is just the
icing on the cake, especially if it can be done automatically.

Ross

PS.  Sorry for the duplicate Casper, I forgot to cc the list.

On Mon, Dec 15, 2008 at 10:30 AM,  <casper....@sun.com> wrote:
>
>>I think the problem for me is not that there's a risk of data loss if
>>a pool becomes corrupt, but that there are no recovery tools
>>available.  With UFS, people expect that if the worst happens, fsck
>>will be able to recover their data in most cases.
>
> Except, of course, that fsck lies.  In "fixes" the meta data and the
> quality of the rest is unknown.
>
> Anyone using UFS knows that UFS file corruption are common; specifically,
> when using a "UFS root" and the system panic's when trying to
> install a device driver, there's a good chance that some files in
> /etc are corrupt. Some were application problems (some code used
> fsync(fileno(fp)); fclose(fp); it doesn't guarantee anything)
>
>
>>With ZFS you have no such tools, yet Victor has on at least two occasions
>>shown that it's quite possible to recover pools that were completely unusable
>>(I believe by making use of old / backup copies of the uberblock).
>
> True; and certainly ZFS should be able backtrack.  But it's
> much more likely to happen "automatically" then using a recovery
> tool.
>
> See, fsck could only be written because specific corruption are known
> and the patterns they have.   With ZFS, you can only backup to
> a certain uberblock and the pattern will be a surprise.
>
> Casper
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Split responsibility for data with ZFS

Reply via email to