This is now CR 6970210.
I've been experimenting with a two system setup in snv_134 where
each system exports a zvol via COMSTAR iSCSI. One system imports
both its own zvol and the one from the other system and puts them
together in a ZFS mirror.
I manually faulted the zvol on one system by physically removing
some drives. What I expect to happen is that ZFS will fault the zvol
pool and the iSCSI stack will detect this and fault the target. Then
ZFS for the mirrored pool will detect a failed device and report it.
Throughout all this the system should operate normally, perhaps will
small delays as it waits on failed devices.
That isn't what happens.
The removed drives were detected and the zvol zpool was faulted.
This eventually resulted in iSCSI "device is busy too long" errors,
and that sounds about right so far.
But the top-level mirror, which is acting as an NFS share, suddenly
vanished from its NFS client! That is, the failure of a zvol tied to
iSCSI seems to poison other parts of the OS causing the NFS to fail.
Isn't that odd?
At the same time, zpool status on the mirrored pool detected nothing
wrong. Eventually, it did detect errors on the failed device in the
mirror, but oddly it didn't offline it as the logs claimed it would.
Instead, it seems that I/O stopped altogether. Also, it appears that
the iSCSI timeout errors are taking way longer than what I have them
set for and even after they have timed out, ZFS is ignoring that and
still keeps trying.
Somehow, I eventually got the pool to unmount and export, but when I
tried to import it, the same thing is happening. First, the iSCSI
errors seem to be ignoring the parameters to timeout and are instead
taking an arbitrarily long time, even longer than the defaults.
Second, ZFS won't give up on trying to import the pool even though
iSCSI is reporting to it that a device has failed. That is, ZFS gets
hung when trying to import pools that contain a failed device. The
pool is set to continue on failure, however. And technically, with
just one device in the mirror failed, it really isn't failed, just
degraded.
These are my iSCSI parameters:
recv-login-rsp-timeout=6
conn-login-max=3
polling-login-delay=2
--
Maurice Volaski, maurice.vola...@einstein.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss