On 9/14/06, Robert Banz <[EMAIL PROTECTED]> wrote:
Unless you're somehow just "making the bits go faster", performance increases typically go hand in hand with some sort of risk that your transactions *might* not make it to disk in a "power off" situation*
Well, a properly designed system incorporating hierarchical checksumming, metadata journalling, and the ability to handle hundreds of in-flight transactions without needing hundreds of threads has the potential to do both. ZFS is a good example of this technique. Also, not to be too pedantic, but no matter how well-designed your system is, there will always be transactions which don't make it to disk. The key is making sure that no part of the distributed system is fooled into thinking a transaction completed, unless it's possible for the downed node to recover the transaction and commit it later.
* disk gets unplugged, machine panics, blahblah ...which is a "risk" almost any filesystem or application takes into consideration, and allows the filesystem user to determine when it's "really necessary" to wait to go forward until data is committed to firm storage, or not. Good or bad, the fileserver is assuming that's what you want to do all of the time in the CopyOnWrite and StoreData_RXStyle (not to mention the volume structure management code in namei_ops, etc.). I guess it's that since we don't have a "channel" to forward along real fsync() messages that we assume that it's what you want to do all the time, or at the time the code was written it was assumed horrible things were going to happen all of the time... cleaning lady unplugs the direct attached SCSI disk, cosmic ray causes a kernel panic, fsck can't reconstruct the filesystem to save it's life... so making sure every transaction was written to disk was probably a good idea. Nowadays with the cleaning lady banned from the datacenter unless escorted, multipathing fibre links to disk storage, filesystems that go beyond even metadata logging to preserve structure (like zfs), the cost/benefit of
Speaking of ZFS, fsync on ZFS is a serious performance issue. With ZFS, an fsync results in an update to the entire fs checksum tree up to, and including, the root checksum. Needless to say, this is a very high-latency operation with potential for quite a few disk seeks. Generally, ZFS tries to do hierarchical checksum updates on the order of every ten seconds. Clone operations are just plain ugly on ZFS. -Tom _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
