On 9/14/06, Robert Banz <[EMAIL PROTECTED]> wrote:

Unless you're somehow just "making the bits go faster", performance
increases typically go hand in hand with some sort of risk that your
transactions *might* not make it to disk in a "power off" situation*


Well, a properly designed system incorporating hierarchical
checksumming, metadata journalling, and the ability to handle hundreds
of in-flight transactions without needing hundreds of threads has the
potential to do both.  ZFS is a good example of this technique.

Also, not to be too pedantic, but no matter how well-designed your
system is, there will always be transactions which don't make it to
disk.  The key is making sure that no part of the distributed system
is fooled into thinking a transaction completed, unless it's possible
for the downed node to recover the transaction and commit it later.

* disk gets unplugged, machine panics, blahblah

...which is a "risk" almost any filesystem or application takes into
consideration, and allows the filesystem user to determine when it's
"really necessary" to wait to go forward until data is committed to
firm storage, or not.  Good or bad, the fileserver is assuming that's
what you want to do all of the time in the CopyOnWrite and
StoreData_RXStyle (not to mention the volume structure management
code in namei_ops, etc.).  I guess it's that since we don't have a
"channel" to forward along real fsync() messages that we assume that
it's what you want to do all the time, or at the time the code was
written it was assumed horrible things were going to happen all of
the time... cleaning lady unplugs the direct attached SCSI disk,
cosmic ray causes a kernel panic, fsck can't reconstruct the
filesystem to save it's life...  so making sure every transaction was
written to disk was probably a good idea.  Nowadays with the cleaning
lady banned from the datacenter unless escorted, multipathing fibre
links to disk storage, filesystems that go beyond even metadata
logging to preserve structure (like zfs), the cost/benefit of

Speaking of ZFS, fsync on ZFS is a serious performance issue.  With
ZFS, an fsync results in an update to the entire fs checksum tree up
to, and including, the root checksum.  Needless to say, this is a very
high-latency operation with potential for quite a few disk seeks.
Generally, ZFS tries to do hierarchical checksum updates on the order
of every ten seconds.  Clone operations are just plain ugly on ZFS.

-Tom
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to