On Sep 30, 2008, at 1:48 PM, Heikki Linnakangas wrote:
This has been suggested before, and the usual objection is precisely that it only protects from errors in the storage layer, giving a false sense of security.
If you can come up with a mechanism for detecting non-storage errors as well, I'm all ears. :)
In the meantime, you're way, way more likely to experience corruption at the storage layer than anywhere else. We've had several corruption events, only one of which was memory related... and we *know* it was memory related because we actually got logs saying so. But with a SAN environment there's a lot of moving parts, all waiting to screw up your data:
filesystem SAN device driver SAN network SAN BIOS drive BIOS drive That's above things that could hose your data outside of storage: kernel CPU memory motherboard
Doesn't some filesystems include a per-block CRC, which would achieve the same thing? ZFS?
Sure, some do. We're on linux and can't run ZFS. And I'll argue that no linux FS is anywhere near as tested as ext3 is, which means that going to some other FS that offers you CRC means you're now exposing yourself to the possibility of issues with the FS itself. Not to mention that changing filesystems on a large production system is very painful.
-- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828
smime.p7s
Description: S/MIME cryptographic signature