On Mar 2, 2014, at 12:45 AM, Philip Robar <philip.ro...@gmail.com> wrote:

> But if you insist: from "Oracle Solaris 11.1 Administration: ZFS File 
> Systems", "Consider using ECC memory to protect against memory corruption. 
> Silent memory corruption can potentially damage your data." [1]

That is in no way specific to ZFS, though; silent memory corruption can cause 
corruption in any number of ways for basically any filesystem. If you value 
your data, you'll want to use ECC, regardless of whether you use ZFS or not.

> The actual error rate found was several orders of magnitude higher than 
> previous small-scale or laboratory studies, with 25,000 to 70,000 errors per 
> billion device hours per megabit (about 2.5–7 × 10−11 error/bit·h)(i.e. about 
> 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end error 
> rate), and more than 8% of DIMM memory modules affected by errors per year.
> So, since you've agreed that ZFS is more vulnerable than other file systems 
> to memory errors, and Google says that these errors are a lot more frequent 
> than most people think that they are then the question becomes just how much 
> more vulnerable is ZFS and is the extent of the corruption likely to be wider 
> or more catastrophic than on other file systems?

It's somewhat misleading to just look at the averages in this case, though, as 
the paper specifically points out that the errors are in fact highly clustered, 
not evenly distributed across devices and/or time. I.e., there are some DIMMs 
that produce a very large number of errors, but the vast majority of DIMMs (92% 
as per the paragraph you quoted above) actually produce no (detectable) bit 
errors at all per year.

> It seems to me that if using ZFS without ECC memory puts someone's data at an 
> increased risk over other file system then they ought to be told that so that 
> they can make an informed decision. Am I really being unreasonable about this?

You keep claiming this, but I still haven't seen any conclusive evidence that 
lack of ECC poses a higher overall risk for your data when using ZFS than with 
other file systems. Note that even if you could find a scenario where ZFS will 
do worse than others (and I maintain that the specific scenario Cyberjock 
describes is not actually plausible), there are other scenarios where ZFS will 
actually catch memory corruption but other file systems will not (e.g., bit 
flip occurs after checksum has been computed but before data is written to 
disk, or bit flip occurs after data has been read from disk but before checksum 
is compared, or bit flip causes stray write of bogus data to disk); without 
knowing the likelihood of each of these scenarios and their respective damage 
potential, it is impossible to say which side is more at risk.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to