On Sun, 15 Jan 2012, Jim Klimov wrote:
It does seem possible that in-memory corruption of data payload and/or checksum of a block before writing it to disk would render it invalid on read (data doesn't match checksum, ZFS returns EIO) . Maybe even worse if the in-memory block is corrupted before the checksumming, and seemingly valid garbage gets stored on disk, read afterwards, and used with blind trust.
Please don't under-state the actual issue. ZFS assumes that RAM is 100% reliable. ZFS uses an in-memory cache called the ARC which can span many tens of gigabytes on busy large memory systems. User data is stored in this ARC and the cached data becomes the reference copy of the data until it is evicted. This means that user data can be silently and undetectably corrupted due to memory corruption. The effects that zfs's checksums can detect are just a small subset of the problems which may occur if memory returns wrong values.
In all these cases RAM is the SPOF (single point of failure) so all ZFS recommendations involve using ECC systems. Alas, even though ECC chips and chipsets are cheap nowadays, not all architectures use them anyway (i.e. desktops, laptops, etc.), and the tagline of running ZFS for "reliable storage on consumer grade hardware" is poisoned by this fact. Other filesystems
Feel free to blame Intel for this since they seem to be primarily responsible for delivering CPUs and chipsets which don't support ECC. AMD has not been such a perpetrator, although it is possible to buy AMD-based systems which don't provide ECC.
I do wonder, however, if it is possible to make a software ECC to detect-and/or-repair small memory corruptions on consumer grade systems. And where would such part fit - in ZFS (i.e.
This could be done for part of the memory but it would obviously result in huge performance loss. I/O to memory would have to become block-oriented rather than random access. It is still necessary for random access to be used in a large part of the memory since it is a requirement in order to run programs and there would no way to defend that part of the memory.
some ECC bits appended in every zfs_*_t structure) or in the {Solaris} kernel for general VM management. And even then there's a question whether this would solve more problems than create a greater one - pose the visibility of solution and hide problems that actually exist (because there would be some non-ECC parts of the data path and GIGO principle can apply at any point). In the bad case, you ECC an invalid piece of memory, and afterwards trust it as it matches the checksum. On the good side, there is a smaller window that data is exposed unprotected, so statistically this solution should help.
The problem is that with unreliable memory, the software-based ECC would not be able to correct the content of the memory since the ECC itself might have been computed incorrectly (due to unreliable memory). You are then faced with notifications of problems that the user can't fix.
The proper solution (regardless of filesystem used) is to assure that ECC is included in any computer that you buy.
Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss