>>>>> "jb" == Jeff Bonwick <jeff.bonw...@sun.com> writes:
jb> We simulated power failure; we did not simulate disks that jb> simply blow off write ordering. Any disk that you'd ever jb> deploy in an enterprise or storage appliance context gets this jb> right. Did you simulate power failure of iSCSI/FC target shelves without power failure of the head node? How about power failure of iscsitadm-style iSCSI targets? How about rebooting the master domain in sun4v---what is it called? I've not had any sun4v but heard the I/O domain, the kernel that contains all the disk drivers, can be rebooted without rebooting the guest-domain kernels which have virtual-disk-drivers, and that sounds like a great opportunity to lose a batch of writes. Do you consider sun4v virtual I/O or iscsitadm as well-fitted to an ``enterprise'' context, or are they not ready for deploying in the Enterprise yet? :) IMHO it'd really be fantastic if almost all the lost ZFS pools turned out to be just this one write cache problem, and ZFS the canary---not in terms of a checksum canary this time, but in terms of shitting itself when write barriers are violated. Then it'll be almost a blessing that ZFS is so vulnerable to it, because maybe there will be enough awareness and pressure that it'll finally become practical to build an end-to-end system without this problem. Suddenly having a database-friendly filesystem everywhere, including trees mounted over NFS/cifs/lustre/whatevers-next, might change some of our assumptions about which MUA's have fragile message stores and what programs need to store things on ``a local drive''. I'm ordering a big batch of crappy peecee hardware tomorrow so I can finally start testing and quit ranting. I'll see if this old post can serve as the qualification tool I keep wanting: http://code.sixapart.com/svn/tools/trunk/diskchecker.pl He used the tool on Linux, I think, and he used it end-to-end, to check fsync() from user-level. which is odd, because I thought I remember reading Linux does _not_ propogate fsync() all the way to the disk, and they're trying to fix it. In its internal storage stack, Linux has separate ideas of ``cache flush'' and ``write barrier'' while my impression is that physical disks have only the latter, so they sort of rely sometimes on things happening ``soon'', but this guy is saying whether fsync() works or not, on Linux ext3, is determined almost entirely by the disk. possibly the tool can be improved---someone on this list had the interesting idea to write backwards, to provoke the drive into wanting to reorder writes across a barrier since even the dumbest drive will want to write in the direction the platter's spinning. I'm not sure that backwards-writing will provoke misbehavior inside iSCSI stacks though. In the end the obvious mtd/ubi-style test of writing to a zpool and trying to destroy it by yanking cords might be the best test.
pgpHeg8r0ekge.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss