Re: [zfs-discuss] Data loss by memory corruption?

2012-01-19 Thread Jim Klimov

2012-01-18 20:36, Nico Williams wrote:

On Wed, Jan 18, 2012 at 4:53 AM, Jim Klimovjimkli...@cos.ru  wrote:

2012-01-18 1:20, Stefan Ring wrote:

I don’t care too much if a single document gets corrupted – there’ll
always be a good copy in a snapshot. I do care however if a whole
directory branch or old snapshots were to disappear.


Well, as far as this problem relies on random memory corruptions,
you don't get to choose whether your document gets broken or some
low-level part of metadata tree ;)


Other filesystems tend to be much more tolerant of bit rot of all
types precisely because they have no block checksums.

But I'd rather have ZFS -- *with* redundancy, of course, and with ECC.

It might be useful to have a way to recover from checksum mismatches
by involving a human.  I'm imagining a tool that tests whether
accepting a block's actual contents results in making data available
that the human thinks checks out, and if so, then rewriting that
block.  Some bit errors might simply result in meaningless metadata,
but in some cases this can be corrected (e.g., ridiculous block
addresses).  But if ECC takes care of the problem then why waste the
effort?


Because RAM ECC only decreases the probability of one type of
corruption?

You still have CPUs (i.e. overclocked and overheated, as is
likely in enthusiast systems, or in laptops with blocked vents,
thus sometimes generating random garbage).

Many other parts are not SPoF in a good design, i.e. noise
on wire, bugs in HBA and HDD firmware can be mitigated by
some hardware redundancy (multipathing, mixed vendors) in
higher-end systems, and by just ZFS approaches in other systems -
such as ditto copies for metadata and by vdev redundancy; but
these can still corrupt the copies=1 data (i.e. on single-disk
laptops without explicit copies=2).


 (Partial answer: because it'd be a very neat GSoC type project!)

Good point for at least one motivator ;)

I don't care how it is done - but it should be!
This time you may even use sorcery, I'll not ask questions! ;)




Besides, what if that document you don't care about is your account's
entry in a banking system (as if they had no other redundancy and
double-checks)? And suddenly you don't exist because of some EIOIO,
or your balance is zeroed (or worse, highly negative)? ;)


This is why we have paper trails, logs, backups, redundancy at various
levels, ...


As if any of them is 100% good and reliable and readily
accessible-available ;)

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-18 Thread Jim Klimov

2012-01-18 1:20, Stefan Ring wrote:

The issue is definitely not specific to ZFS.  For example, the whole OS
depends on relable memory content in order to function.  Likewise, no one
likes it if characters mysteriously change in their word processing
documents.


I don’t care too much if a single document gets corrupted – there’ll
always be a good copy in a snapshot. I do care however if a whole
directory branch or old snapshots were to disappear.


Well, as far as this problem relies on random memory corruptions,
you don't get to choose whether your document gets broken or some
low-level part of metadata tree ;)

Besides, what if that document you don't care about is your account's
entry in a banking system (as if they had no other redundancy and
double-checks)? And suddenly you don't exist because of some EIOIO,
or your balance is zeroed (or worse, highly negative)? ;)

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-18 Thread Nico Williams
On Wed, Jan 18, 2012 at 4:53 AM, Jim Klimov jimkli...@cos.ru wrote:
 2012-01-18 1:20, Stefan Ring wrote:
 I don’t care too much if a single document gets corrupted – there’ll
 always be a good copy in a snapshot. I do care however if a whole
 directory branch or old snapshots were to disappear.

 Well, as far as this problem relies on random memory corruptions,
 you don't get to choose whether your document gets broken or some
 low-level part of metadata tree ;)

Other filesystems tend to be much more tolerant of bit rot of all
types precisely because they have no block checksums.

But I'd rather have ZFS -- *with* redundancy, of course, and with ECC.

It might be useful to have a way to recover from checksum mismatches
by involving a human.  I'm imagining a tool that tests whether
accepting a block's actual contents results in making data available
that the human thinks checks out, and if so, then rewriting that
block.  Some bit errors might simply result in meaningless metadata,
but in some cases this can be corrected (e.g., ridiculous block
addresses).  But if ECC takes care of the problem then why waste the
effort?  (Partial answer: because it'd be a very neat GSoC type
project!)

 Besides, what if that document you don't care about is your account's
 entry in a banking system (as if they had no other redundancy and
 double-checks)? And suddenly you don't exist because of some EIOIO,
 or your balance is zeroed (or worse, highly negative)? ;)

This is why we have paper trails, logs, backups, redundancy at various
levels, ...

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-17 Thread Bayard G. Bell
On Sun, 2012-01-15 at 16:28 +0400, Jim Klimov wrote:
 2012-01-14 18:36, Stefan Ring wrote:
  Inspired by the paper End-to-end Data Integrity for File Systems: A
  ZFS Case Study [1], I've been thinking if it is possible to devise a way,
  in which a minimal in-memory data corruption would cause massive data
  loss. I could imagine a scenario where an entire directory branch
  drops off the tree structure, for example. Since I know too little
  about ZFS's structure, I'm also asking myself if it is possible to
  make old snapshots disappear via memory corruption or lose data blocks
  to leakage (not containing data, but not marked as available).

I've never understood why these conclusions are considered so
interesting-- it's as though ZFS were analyzed as a system but the
conclusions weren't drawn systematically.

If you don't protect buffer integrity elsewhere on the system, what
would in be worth for ZFS to provide in-core integrity for its kernel
pages? The vast preponderance of consumers of ZFS data have to use
buffers outside of the ZFS kernel subsystem, leaving you with a trivial
added assurance in protecting against in-core corruption. Compare the
effort of doing that to the cost of using ECC, and there doesn't seem to
be anything like a compelling case for putting all that work into ZFS or
accepting the overhead that would result.

Put into a more reasonable context, there may still be something there,
but it looks very different than how the authors seemed to pitch it. Or
have I missed something?

 Alas,
 even though ECC chips and chipsets are cheap nowadays, not all
 architectures use them anyway (i.e. desktops, laptops, etc.),
 and the tagline of running ZFS for reliable storage on consumer
 grade hardware is poisoned by this fact. 

Yes, you can get reliable and probably performant ZFS storage without
having to buy enterprise-class components. But you still have to treat
midrange or consumer components as differentiated on reliability and
performance if you want achieve those things meaningfully. ZFS is good,
but it's not magic.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-17 Thread Richard Elling
On Jan 16, 2012, at 8:08 AM, David Magda wrote:

 On Mon, January 16, 2012 01:19, Richard Elling wrote:
 
 [1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf
 
 Yes. Netapp has funded those researchers in the past. Looks like a FUD
 piece to me.
 Lookout everyone, the memory system you bought from Intel might suck!
 
 From the paper:
 
 This material is based upon work supported by the National Science
 Foundation under the following grants: CCF-0621487, CNS-0509474,
 CNS-0834392, CCF-0811697, CCF-0811697, CCF-0937959, as well as by generous
 donations from NetApp, Inc, Sun Microsystems, and Google.
 
 So Sun paid to FUD themselves?

wouldn't be the first time...

 The conclusions are hardly unreasonable:
 
 While the reliability mechanisms in ZFS are able to provide reasonable
 robustness against disk corruptions, memory corruptions still remain a
 serious problem to data integrity.
 
 I've heard the same thing said (use ECC!) on this list many times over
 the years.

Agree with the ECC comment :-)

If we can classify this as encouragement to use ECC, then you don't need to 
drag ZFS
into the conversation. Interestingly, the only market that doesn't use ECC is 
the PeeCee
market. Embedded and enterprise markets use ECC.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-17 Thread Bob Friesenhahn

On Tue, 17 Jan 2012, Richard Elling wrote:

Agree with the ECC comment :-)

If we can classify this as encouragement to use ECC, then you don't need to 
drag ZFS
into the conversation. Interestingly, the only market that doesn't use ECC is 
the PeeCee
market. Embedded and enterprise markets use ECC.


The issue is definitely not specific to ZFS.  For example, the whole 
OS depends on relable memory content in order to function.  Likewise, 
no one likes it if characters mysteriously change in their word 
processing documents.


Most of the blame seems to focus on Intel, with its objective to spew 
CPUs with the highest-clocking performance at the lowest possible 
price point for the desktop market.  AMD CPUs seem to usually be 
slower but include ECC as standard in the CPU or AMD-supplied chipset.


If it can be believed (and even if some may doubt it), Intel sells 
Xeon-branded CPUs which lack ECC support.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-17 Thread Stefan Ring
 The issue is definitely not specific to ZFS.  For example, the whole OS
 depends on relable memory content in order to function.  Likewise, no one
 likes it if characters mysteriously change in their word processing
 documents.

I don’t care too much if a single document gets corrupted – there’ll
always be a good copy in a snapshot. I do care however if a whole
directory branch or old snapshots were to disappear.

 Most of the blame seems to focus on Intel, with its objective to spew CPUs
 with the highest-clocking performance at the lowest possible price point for
 the desktop market.  AMD CPUs seem to usually be slower but include ECC as
 standard in the CPU or AMD-supplied chipset.

Agreed. I originally bought an AMD-based system for that reason alone,
with the intention of running OpenSolaris on it. Alas, it performed
abysmally, so it was quickly swapped for an Intel-based one (without
ECC).

Additionally, consider that Joyent’s port of KVM supports only Intel
systems, AFAIK.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-17 Thread Bob Friesenhahn

On Tue, 17 Jan 2012, Stefan Ring wrote:


Additionally, consider that Joyent’s port of KVM supports only Intel
systems, AFAIK.


Hopefully that will be a short-term issue.  64-core AMD Opteron 
systems are affordable now.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-16 Thread David Magda
On Mon, January 16, 2012 01:19, Richard Elling wrote:

 [1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf

 Yes. Netapp has funded those researchers in the past. Looks like a FUD
 piece to me.
 Lookout everyone, the memory system you bought from Intel might suck!

From the paper:

 This material is based upon work supported by the National Science
 Foundation under the following grants: CCF-0621487, CNS-0509474,
 CNS-0834392, CCF-0811697, CCF-0811697, CCF-0937959, as well as by generous
 donations from NetApp, Inc, Sun Microsystems, and Google.

So Sun paid to FUD themselves?

The conclusions are hardly unreasonable:

 While the reliability mechanisms in ZFS are able to provide reasonable
 robustness against disk corruptions, memory corruptions still remain a
 serious problem to data integrity.

I've heard the same thing said (use ECC!) on this list many times over
the years.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-16 Thread John Martin

On 01/16/12 11:08, David Magda wrote:


The conclusions are hardly unreasonable:


While the reliability mechanisms in ZFS are able to provide reasonable
robustness against disk corruptions, memory corruptions still remain a
serious problem to data integrity.


I've heard the same thing said (use ECC!) on this list many times over
the years.


I believe the whole paragraph quoted from the USENIX paper above is
important:

  While the reliability mechanisms in ZFS are able to
  provide reasonable robustness against disk corruptions,
  memory corruptions still remain a serious problem to
  data integrity. Our results for memory corruptions in-
  dicate cases where bad data is returned to the user, oper-
  ations silently fail, and the whole system crashes. Our
  probability analysis shows that one single bit flip has
  small but non-negligible chances to cause failures such
  as reading/writing corrupt data and system crashing.

The authors provide probability calculations in section 6.3
for single bit flips.  ECC provides detection and correction
of single bit flips.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-15 Thread Jim Klimov

2012-01-14 18:36, Stefan Ring wrote:

Inspired by the paper End-to-end Data Integrity for File Systems: A
ZFS Case Study [1], I've been thinking if it is possible to devise a way,
in which a minimal in-memory data corruption would cause massive data
loss. I could imagine a scenario where an entire directory branch
drops off the tree structure, for example. Since I know too little
about ZFS's structure, I'm also asking myself if it is possible to
make old snapshots disappear via memory corruption or lose data blocks
to leakage (not containing data, but not marked as available).

I'd appreciate it if someone with a good understanding of ZFS's
internals and principles could comment on the possibility of such
scenarios.

[1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf


By no means I'm an expert like ones you seek, but I'm asking similar
questions, and have more popping up ;)

I do have some reported corruptions on my non-ECC system despite
raidz2 on disk, so I have a keen interest as to how stuff works
and why it doesn't sometimes ;)

As for block leakage, according to error messages I'm seeing
now, leaked blocks are at least expected and checked for:
allocating allocated segment and freeing free segment.
How my system got here - that's the puzzle...

It does seem possible that in-memory corruption of data payload
and/or checksum of a block before writing it to disk would render
it invalid on read (data doesn't match checksum, ZFS returns EIO) .
Maybe even worse if the in-memory block is corrupted before the
checksumming, and seemingly valid garbage gets stored on disk,
read afterwards, and used with blind trust.
If it is a leaf block (userdata) you just get a corrupted file.
If it is a metadata block, and if the corruption happened before
it was ditto-written to several disk locations, you're in trouble.

It is likewise possible that data in-RAM gets corrupted after
reading from disk and checksum-checking, but before using it
as a metadata block or whatever.

If you're as lucky as to have irrepairable (by ditto blocks)
corruption in a metadata block near the root of a tree, you
can happen to be in bad trouble.

In all these cases RAM is the SPOF (single point of failure)
so all ZFS recommendations involve using ECC systems. Alas,
even though ECC chips and chipsets are cheap nowadays, not all
architectures use them anyway (i.e. desktops, laptops, etc.),
and the tagline of running ZFS for reliable storage on consumer
grade hardware is poisoned by this fact. Other filesystems
obviously suffer the same from bad components, but ZFS reports
on these detected errors, and unlike other systems that let
you dismiss the errors (i.e. free all blocks and files touched
by a corrupt block, leaving you with a smaller but consistent
tree of data blocks), or don't even notice them, ZFS tends to
get really upset on many of themm and ask for recovery from
backups (as if they are 100% reliable).

I do wonder, however, if it is possible to make a software ECC
to detect-and/or-repair small memory corruptions on consumer
grade systems. And where would such part fit - in ZFS (i.e.
some ECC bits appended in every zfs_*_t structure) or in the
{Solaris} kernel for general VM management. And even then
there's a question whether this would solve more problems than
create a greater one - pose the visibility of solution and
hide problems that actually exist (because there would be
some non-ECC parts of the data path and GIGO principle can
apply at any point). In the bad case, you ECC an invalid
piece of memory, and afterwards trust it as it matches the
checksum. On the good side, there is a smaller window that
data is exposed unprotected, so statistically this solution
should help.


HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-15 Thread Bob Friesenhahn

On Sun, 15 Jan 2012, Jim Klimov wrote:


It does seem possible that in-memory corruption of data payload
and/or checksum of a block before writing it to disk would render
it invalid on read (data doesn't match checksum, ZFS returns EIO) .
Maybe even worse if the in-memory block is corrupted before the
checksumming, and seemingly valid garbage gets stored on disk,
read afterwards, and used with blind trust.


Please don't under-state the actual issue.  ZFS assumes that RAM is 
100% reliable.  ZFS uses an in-memory cache called the ARC which can 
span many tens of gigabytes on busy large memory systems.  User data 
is stored in this ARC and the cached data becomes the reference copy 
of the data until it is evicted.  This means that user data can be 
silently and undetectably corrupted due to memory corruption.  The 
effects that zfs's checksums can detect are just a small subset of the 
problems which may occur if memory returns wrong values.



In all these cases RAM is the SPOF (single point of failure)
so all ZFS recommendations involve using ECC systems. Alas,
even though ECC chips and chipsets are cheap nowadays, not all
architectures use them anyway (i.e. desktops, laptops, etc.),
and the tagline of running ZFS for reliable storage on consumer
grade hardware is poisoned by this fact. Other filesystems


Feel free to blame Intel for this since they seem to be primarily 
responsible for delivering CPUs and chipsets which don't support ECC. 
AMD has not been such a perpetrator, although it is possible to buy 
AMD-based systems which don't provide ECC.



I do wonder, however, if it is possible to make a software ECC
to detect-and/or-repair small memory corruptions on consumer
grade systems. And where would such part fit - in ZFS (i.e.


This could be done for part of the memory but it would obviously 
result in huge performance loss.  I/O to memory would have to become 
block-oriented rather than random access.  It is still necessary for 
random access to be used in a large part of the memory since it is a 
requirement in order to run programs and there would no way to defend 
that part of the memory.



some ECC bits appended in every zfs_*_t structure) or in the
{Solaris} kernel for general VM management. And even then
there's a question whether this would solve more problems than
create a greater one - pose the visibility of solution and
hide problems that actually exist (because there would be
some non-ECC parts of the data path and GIGO principle can
apply at any point). In the bad case, you ECC an invalid
piece of memory, and afterwards trust it as it matches the
checksum. On the good side, there is a smaller window that
data is exposed unprotected, so statistically this solution
should help.


The problem is that with unreliable memory, the software-based ECC 
would not be able to correct the content of the memory since the ECC 
itself might have been computed incorrectly (due to unreliable 
memory).  You are then faced with notifications of problems that the 
user can't fix.


The proper solution (regardless of filesystem used) is to assure that 
ECC is included in any computer that you buy.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-15 Thread Richard Elling
On Jan 14, 2012, at 6:36 AM, Stefan Ring wrote:

 Inspired by the paper End-to-end Data Integrity for File Systems: A
 ZFS Case Study [1], I've been thinking if it is possible to devise a way,
 in which a minimal in-memory data corruption would cause massive data
 loss.

For enterprise-class systems, you will find hardware protection such as ECC
and other mechanisms all the way up and down the datapath. For example,
if you build an ALU, you can add a few transistors to also detect the various
failure modes that afflict data flowing through an ALU. This is one of the 
things
that diffentiates a mainframe or SPARC64 processor from a run-of-the-mill PeeCee
processor.

 I could imagine a scenario where an entire directory branch
 drops off the tree structure, for example. Since I know too little
 about ZFS's structure, I'm also asking myself if it is possible to
 make old snapshots disappear via memory corruption or lose data blocks
 to leakage (not containing data, but not marked as available).

Sure. If you'd like a fright, read the errata sheet for a modern microprocessor 
:-)

 I'd appreciate it if someone with a good understanding of ZFS's
 internals and principles could comment on the possibility of such
 scenarios.

ZFS does expect that the processor, memory, and I/O systems work to some 
degree. The only way to get beyond this sort of dependency is to implement a
system like we do for avionics.

 
 [1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf

Yes. Netapp has funded those researchers in the past. Looks like a FUD piece to 
me.
Lookout everyone, the memory system you bought from Intel might suck!
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss