On Thu, 2010-06-17 at 16:16 -0400, Eric Schrock wrote: > On Jun 17, 2010, at 3:52 PM, Garrett D'Amore wrote: > > > > Anyway, I'm happy to share the code, and even go through the > > request-sponsor process to push this upstream. I would like the > > opinions of the ZFS and FMA teams though... is the approach I'm using > > sane, or have I missed some important design principle? Certainly it > > *seems* to work well on the systems I've tested, and we (Nexenta) think > > that it fixes what appears to us to be a critical deficiency in the ZFS > > error detection and handling. But I'd like to hear other thoughts. > > I don't think this is the right approach. You'll end up faulting drives that > should be marked removed, among other things. The correct answer is for > drivers to use the new LDI events (LDI_EV_DEVICE_REMOVE) to indicate device > removal. This is already in ON but not hooked up to ZFS. It's easy to do, > but just hasn't been pushed yet. > > Note that for legacy drivers, the DKIOCGETSTATE ioctl() is supposed to handle > this for you. However, there is a race condition where if the vdev probe > happens before the driver has transitioned to a state where it returns > DEV_GONE, then we can miss this event (because it is only probed in reaction > to I/O failure and we won't try again). We spent some time looking at ways > to eliminate this window, but it ultimately got quite ugly and doesn't > support hot spares, so the better answer was to just properly support the LDI > events. >
I actually started with DKIOCGSTATE as my first approach, modifying sd.c. But I had problems because what I found is that nothing was issuing this ioctl properly except for removable/hotpluggable media (and the SAS/SATA controllers/frameworks are not indicating this. I tried overriding that in sd.c but I still found that there was another bug where the HAL module that does the monitoring does not monitor devices that are present and in use (mounted filesystems) during boot. I think HAL was designed for removable media that would not be automatically mounted by zfs during boot. I didn't analyze this further. Furthermore, for the devices that did work, the report was "device administratively removed".. which is *far* different from a device that has unexpectedly offlined. There was no way to distinguish a device removed via cfgadm from a device that went away unexpectedly. > If you wanted to expand support for legacy drivers, you should expand use of > the DKIOCGETSTATE ioctl(), perhaps with an async task that probes spares, as > well as a delayed timer (within the bounds of the zfs-diagsnosis > resource.removed horizon) to close the associated window for normal vdevs. > However, a better solution would be to update the drivers that matter to use > LDI_EV_DEVICE_REMOVE, which provides much crisper semantics and will be used > in the future to hook into other subsystems. Is "sd.c" considered a legacy driver? Its what is responsible for the vast majority of disks. That said, perhaps the problem is the HBA drivers? > > In order for anything to be accepted upstream, it's key that it be able to > distinguish between REMOVED and FAULTED devices. Mis-diagnosing a removed > drive as faulted is very bad (fault = broken hardware = service call = $$$). So how do we distinguish "removed on purpose" as opposed to "removed by accident, faulted cable, or other non administrative issue?" I presume that a removal initiated via cfgadm or some other tool could put the ZFS vdev into an offline state, and this would prevent the logic from accidentally marking the device FAULTED. (Ideally it would also mark the device "REMOVED".) Put another way, if a libzfs command is used to offline or remove the device from the pool, then none of the code I've written is engaged. What I've done is supply code to handle "surprise" removal or disconnect. > > - Eric > > P.S. the bug in the ZFS scheme module is legit, we just haven't fixed it yet I can send the diffs for that fix... they're small and obvious enough. - Garrett > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock > > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss