Re: [zfs-discuss] Data loss by memory corruption?

2012-01-18 Thread Jim Klimov

2012-01-18 1:20, Stefan Ring wrote:

The issue is definitely not specific to ZFS.  For example, the whole OS
depends on relable memory content in order to function.  Likewise, no one
likes it if characters mysteriously change in their word processing
documents.


I don’t care too much if a single document gets corrupted – there’ll
always be a good copy in a snapshot. I do care however if a whole
directory branch or old snapshots were to disappear.


Well, as far as this problem relies on random memory corruptions,
you don't get to choose whether your document gets broken or some
low-level part of metadata tree ;)

Besides, what if that document you don't care about is your account's
entry in a banking system (as if they had no other redundancy and
double-checks)? And suddenly you don't exist because of some EIOIO,
or your balance is zeroed (or worse, highly negative)? ;)

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-18 Thread Nico Williams
On Wed, Jan 18, 2012 at 4:53 AM, Jim Klimov jimkli...@cos.ru wrote:
 2012-01-18 1:20, Stefan Ring wrote:
 I don’t care too much if a single document gets corrupted – there’ll
 always be a good copy in a snapshot. I do care however if a whole
 directory branch or old snapshots were to disappear.

 Well, as far as this problem relies on random memory corruptions,
 you don't get to choose whether your document gets broken or some
 low-level part of metadata tree ;)

Other filesystems tend to be much more tolerant of bit rot of all
types precisely because they have no block checksums.

But I'd rather have ZFS -- *with* redundancy, of course, and with ECC.

It might be useful to have a way to recover from checksum mismatches
by involving a human.  I'm imagining a tool that tests whether
accepting a block's actual contents results in making data available
that the human thinks checks out, and if so, then rewriting that
block.  Some bit errors might simply result in meaningless metadata,
but in some cases this can be corrected (e.g., ridiculous block
addresses).  But if ECC takes care of the problem then why waste the
effort?  (Partial answer: because it'd be a very neat GSoC type
project!)

 Besides, what if that document you don't care about is your account's
 entry in a banking system (as if they had no other redundancy and
 double-checks)? And suddenly you don't exist because of some EIOIO,
 or your balance is zeroed (or worse, highly negative)? ;)

This is why we have paper trails, logs, backups, redundancy at various
levels, ...

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss