On 03/08/2010 05:40 AM, Or Gerlitz wrote:
Mike,
I am sure this used to work, and quite sure you made some comments on the
subject
now, this isn't working and I can't find your comments, anyway:
using the suggested/similar setting for nop timesout as in the read me
multi-pathing section,
taking a port down causes the nop watch-dog to catch the situation
14:51:33 cto-1 kernel: sd 4:0:0:1: [sdd] Attached SCSI disk
14:51:33 cto-1 kernel: sd 5:0:0:1: [sde] Attached SCSI disk
14:51:34 cto-1 iscsid: connection2:0 is operational now
14:51:34 cto-1 iscsid: connection3:0 is operational now
14:52:19 cto-1 kernel: connection3:0: ping timeout of 10 secs expired, recv
timeout 5 ...
14:52:19 cto-1 kernel: connection3:0: detected conn error (1011)
14:52:20 cto-1 iscsid: Kernel reported iSCSI connection 3:0 error (1011) state
(3)
14:52:35 cto-1 kernel: session3: session recovery timed out after 15 secs
and change the session state such that $ iscsiadm -m session/host -P 3 yield
for the failed session/host
iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FREE
Internal iscsid Session State: REPOEN
where the session which is associated with the other port of the initiator
which is UP gives
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
BUT, for some reason the SCSI host state is still "running" and not "blocked"
even after 30 minutes. I remember this wasn't the case in the not far away past... am I doing
something wrong? don't tell me its RTFM...
I got this behaviour on both RHEL5.4 and 2.6.33 with iscsi/tcp and iser. I just
run this without multipathing around. When multipathing was used (RHEL 5.4) the
multipath driver was aware that the device is done as of the
path checker timeout. see more details on my config in the attach.
The host state is never going to change for this. The scsi host and
session state are completely different. It is just due the weirdness of
iscsi_tcp and ib_iser that we do a host per session. Normally drivers do
a host per pci resource, and so if one session goes down the entire host
state is not going to change due to just the one session down down
because the host can recover the session while leaving the host in a
normal state.
Although, the scsi_error.c eh is overly careful and will bring down the
entire host when a cmd times out and that is when you would see the host
state go into recovery. As we have discussed before the scsi eh has to
support lots of simple drivers that cannot handle the eh runinng while
running normal IO so it will stop the entire host. This is why we for
session level recovery we try to prevent the scsi eh from running.
When the session initially goes down and is cleaned up (when
iscsi_conn_stop is called) it blocks the session. That puts the devices
in the blocked state. Then when the recovery/replacemnt timeout fires
and you see
session3: session recovery timed out after 15 secs
The iscsi layer unblocks the devices (puts them in running state) and
fails all IO to the devices.
On linux-scsi we have discussed that we need a new state for this.
Something like the offline state. I have to reread the thread on why
people do not like offline for that state (I think becuase offline means
offlined due to the scsi eh failing).
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.