On Mar 2, 2014, at 12:45 AM, Philip Robar <philip.ro...@gmail.com> wrote:
> But if you insist: from "Oracle Solaris 11.1 Administration: ZFS File > Systems", "Consider using ECC memory to protect against memory corruption. > Silent memory corruption can potentially damage your data."  That is in no way specific to ZFS, though; silent memory corruption can cause corruption in any number of ways for basically any filesystem. If you value your data, you'll want to use ECC, regardless of whether you use ZFS or not. > The actual error rate found was several orders of magnitude higher than > previous small-scale or laboratory studies, with 25,000 to 70,000 errors per > billion device hours per megabit (about 2.5–7 × 10−11 error/bit·h)(i.e. about > 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end error > rate), and more than 8% of DIMM memory modules affected by errors per year. > > So, since you've agreed that ZFS is more vulnerable than other file systems > to memory errors, and Google says that these errors are a lot more frequent > than most people think that they are then the question becomes just how much > more vulnerable is ZFS and is the extent of the corruption likely to be wider > or more catastrophic than on other file systems? It's somewhat misleading to just look at the averages in this case, though, as the paper specifically points out that the errors are in fact highly clustered, not evenly distributed across devices and/or time. I.e., there are some DIMMs that produce a very large number of errors, but the vast majority of DIMMs (92% as per the paragraph you quoted above) actually produce no (detectable) bit errors at all per year. > It seems to me that if using ZFS without ECC memory puts someone's data at an > increased risk over other file system then they ought to be told that so that > they can make an informed decision. Am I really being unreasonable about this? You keep claiming this, but I still haven't seen any conclusive evidence that lack of ECC poses a higher overall risk for your data when using ZFS than with other file systems. Note that even if you could find a scenario where ZFS will do worse than others (and I maintain that the specific scenario Cyberjock describes is not actually plausible), there are other scenarios where ZFS will actually catch memory corruption but other file systems will not (e.g., bit flip occurs after checksum has been computed but before data is written to disk, or bit flip occurs after data has been read from disk but before checksum is compared, or bit flip causes stray write of bogus data to disk); without knowing the likelihood of each of these scenarios and their respective damage potential, it is impossible to say which side is more at risk.
Description: S/MIME cryptographic signature