Haudy Kazemi wrote:

I think a better question would be: what kind of tests would be most
promising for turning some subclass of these lost pools reported on
the mailing list into an actionable bug?

my first bet would be writing tools that test for ignored sync cache
commands leading to lost writes, and apply them to the case when iSCSI
targets are rebooted but the initiator isn't.

I think in the process of writing the tool you'll immediately bump
into a defect, because you'll realize there is no equivalent of a
'hard' iSCSI mount like there is in NFS.  and there cannot be a strict
equivalent to 'hard' mounts in iSCSI, because we want zpool redundancy
to preserve availability when an iSCSI target goes away.  I think the
whole model is wrong somehow.
I'd surely hope that a ZFS pool with redundancy built on iSCSI targets could survive the loss of some targets whether due to actual failures or necessary upgrades to the iSCSI targets (think OS upgrades + reboots on the systems that are offering iSCSI devices to the network.)


I've had a mirrored zpool created from solaris iSCSI target servers in production since April 2008. I've had disks die and reboots of the target servers - ZFS has handled them very well. My biggest wish is to be able to tune the iSCSI timeout value so ZFS can failover reads/writes to the other half of the mirror quicker than it does now (about 180 seconds on my config). A minor gripe considering the features that ZFS provides.

I've also had the zfs server (the initiator aggregating the mirrored disks) unintentionally power cycled with the iscsi zpool imported. The pool re-imported and scrubbed fine.

ZFS is definitely my FS of choice - by far.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to