>>>>> "re" == Richard Elling <richard.ell...@gmail.com> writes: >>>>> "dc" == Daniel Carosone <d...@geek.com.au> writes:
re> In general, I agree. How would you propose handling nested re> mounts? force-unmount them. (so that they can be manually mounted elsewhere, if desired, or even in the same place with the middle filesystem missing and empty directories in between. In the latter case the nfs fsid should stay the same so that hard-mounted clients can continue once a sysadmin forces the remount. Remember, hard-mounted NFS clients will do this even hours or days later, and this behavior can be extremely useful to a batch cluster thqat's hard to start, or even just someone who doesn't want to lose his last edit.) And make force-mounting actually work like it does on Mac OS. dc> Please look at the pool property "failmode". It doesn't work, though. We've been over this. Failmode applies after it's decided that the drive is failed, but it can take an arbitrary time---minutes, hours, or forever---for an underlying driver to report that a drive is failed up to ZFS, and until then (a) you get ``wait'' no matter what you picked, and (b) commands like 'zpool status' hang for all pools, where in a resiliently-designed system they would hang for no pools especially not the pool affected by the unresponsive device. One might reasonably want a device state like HUNG or SLOW or >30SEC in 'zpool status', along with the ability to 'zpool offline' any device at any time and, when doing so, cancel all outstanding commands to that device to zfs's view as if they'd gotten failures from the driver even though they're still waiting for responses from the driver. That device state doesn't exist partly because 'zpool status' isn't meant to work well enough to ever return such a state. 'failmode' is not a real or complete answer so long as we agree it's reasonable to expect maintenance commands to work all the time and not freeze up for intervals of 180sec - <several hours> - <forever>. I understand most Unixes do act this way, not just Solaris, but it's really not good enough. dc> The other part of the issue, when failmode is set to the dc> default "wait", relates to lower-level drivers and subsystems dc> recovering reliably to things like removable disks reappearing dc> after removal. There's surely room for improvement in some dc> cases there, and perhaps your specific chipsets How do you handle the case when a hotplug SATA drive is powered off unexpectedly with data in its write cache? Do you replay the writes, or do they go down the ZFS hotplug write hole? I don't think this side of the issue is dependent on ``specific chipsets''.
pgp9TCOl1MYsd.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss