>>>>> "jb" == Jeff Bonwick <jeff.bonw...@sun.com> writes:

    jb> We simulated power failure; we did not simulate disks that
    jb> simply blow off write ordering.  Any disk that you'd ever
    jb> deploy in an enterprise or storage appliance context gets this
    jb> right.

Did you simulate power failure of iSCSI/FC target shelves without
power failure of the head node?  

How about power failure of iscsitadm-style iSCSI targets?

How about rebooting the master domain in sun4v---what is it called?
I've not had any sun4v but heard the I/O domain, the kernel that
contains all the disk drivers, can be rebooted without rebooting the
guest-domain kernels which have virtual-disk-drivers, and that sounds
like a great opportunity to lose a batch of writes.

Do you consider sun4v virtual I/O or iscsitadm as well-fitted to an
``enterprise'' context, or are they not ready for deploying in the
Enterprise yet?  :)


IMHO it'd really be fantastic if almost all the lost ZFS pools turned
out to be just this one write cache problem, and ZFS the canary---not
in terms of a checksum canary this time, but in terms of shitting
itself when write barriers are violated.  Then it'll be almost a
blessing that ZFS is so vulnerable to it, because maybe there will be
enough awareness and pressure that it'll finally become practical to
build an end-to-end system without this problem.  Suddenly having a
database-friendly filesystem everywhere, including trees mounted over
NFS/cifs/lustre/whatevers-next, might change some of our assumptions
about which MUA's have fragile message stores and what programs need
to store things on ``a local drive''.

I'm ordering a big batch of crappy peecee hardware tomorrow so I can
finally start testing and quit ranting.  I'll see if this old post can
serve as the qualification tool I keep wanting:

 http://code.sixapart.com/svn/tools/trunk/diskchecker.pl

He used the tool on Linux, I think, and he used it end-to-end, to
check fsync() from user-level.  which is odd, because I thought I
remember reading Linux does _not_ propogate fsync() all the way to the
disk, and they're trying to fix it.  In its internal storage stack,
Linux has separate ideas of ``cache flush'' and ``write barrier''
while my impression is that physical disks have only the latter, so
they sort of rely sometimes on things happening ``soon'', but this guy
is saying whether fsync() works or not, on Linux ext3, is determined
almost entirely by the disk.

possibly the tool can be improved---someone on this list had the
interesting idea to write backwards, to provoke the drive into wanting
to reorder writes across a barrier since even the dumbest drive will
want to write in the direction the platter's spinning.  I'm not sure
that backwards-writing will provoke misbehavior inside iSCSI stacks
though.  In the end the obvious mtd/ubi-style test of writing to a
zpool and trying to destroy it by yanking cords might be the best
test.

Attachment: pgpHeg8r0ekge.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to