Haudy Kazemi wrote:
I think a better question would be: what kind of tests would be most
promising for turning some subclass of these lost pools reported on
the mailing list into an actionable bug?
my first bet would be writing tools that test for ignored sync cache
commands leading to lost writes, and apply them to the case when iSCSI
targets are rebooted but the initiator isn't.
I think in the process of writing the tool you'll immediately bump
into a defect, because you'll realize there is no equivalent of a
'hard' iSCSI mount like there is in NFS. and there cannot be a strict
equivalent to 'hard' mounts in iSCSI, because we want zpool redundancy
to preserve availability when an iSCSI target goes away. I think the
whole model is wrong somehow.
I'd surely hope that a ZFS pool with redundancy built on iSCSI targets
could survive the loss of some targets whether due to actual failures or
necessary upgrades to the iSCSI targets (think OS upgrades + reboots on
the systems that are offering iSCSI devices to the network.)
I've had a mirrored zpool created from solaris iSCSI target servers in
production since April 2008. I've had disks die and reboots of the
target servers - ZFS has handled them very well. My biggest wish is to
be able to tune the iSCSI timeout value so ZFS can failover reads/writes
to the other half of the mirror quicker than it does now (about 180
seconds on my config). A minor gripe considering the features that ZFS
provides.
I've also had the zfs server (the initiator aggregating the mirrored
disks) unintentionally power cycled with the iscsi zpool imported. The
pool re-imported and scrubbed fine.
ZFS is definitely my FS of choice - by far.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss