On Oct 25, 2011, at 3:09 PM, Mark Hahn wrote: >> CouchDB does a full sync, because this is the only way to be proof against >> disasters like power failures > > Isn't couchdb crash-proof due to append-only writing? What do you gain > other than possible loss of latest writes, which you can lose anyway with a > fsync.
You can get corrupted files even with append-only writing. In the worst case, let’s say everything gets written to disk before the power failure _except_ for one disk block in the middle of the update*. After rebooting, you have what appears to be a valid file (it’s got the magic trailer) except that 4096 (or 512 or whatever) bytes in one of the last updated documents are garbage. I don’t know the details of how CouchDB finds the trailer in the file, but it would have to be doing something like a checksum of every single write to guard against that; which seems too expensive to me. Instead, to be safe, what you do is write the payload, wait for a full fsync, then write the trailer only after you know that the entire payload is safely on the platters. —Jens * Disk controllers don’t write sectors in the order they receive the write requests. They shove them in the cache, then write them out grouped by tracks to minimize seek time.
