On 03/08/2010 05:40 AM, Or Gerlitz wrote:
Mike,

I am sure this used to work, and quite sure you made some comments on the 
subject
now, this isn't working and I can't find your comments, anyway:

using the suggested/similar setting for nop timesout as in the read me 
multi-pathing section,
taking a port down causes the nop watch-dog to catch the situation

14:51:33 cto-1 kernel: sd 4:0:0:1: [sdd] Attached SCSI disk
14:51:33 cto-1 kernel: sd 5:0:0:1: [sde] Attached SCSI disk
14:51:34 cto-1 iscsid: connection2:0 is operational now
14:51:34 cto-1 iscsid: connection3:0 is operational now
14:52:19 cto-1 kernel:  connection3:0: ping timeout of 10 secs expired, recv 
timeout 5 ...
14:52:19 cto-1 kernel:  connection3:0: detected conn error (1011)
14:52:20 cto-1 iscsid: Kernel reported iSCSI connection 3:0 error (1011) state 
(3)
14:52:35 cto-1 kernel:  session3: session recovery timed out after 15 secs


and change the session state such that $ iscsiadm -m session/host -P 3 yield
for the failed session/host

  iSCSI Connection State: TRANSPORT WAIT
                         iSCSI Session State: FREE
                         Internal iscsid Session State: REPOEN

where the session which is associated with the other port of the initiator 
which is UP gives

       iSCSI Connection State: LOGGED IN
                         iSCSI Session State: LOGGED_IN
                         Internal iscsid Session State: NO CHANGE

BUT, for some reason the SCSI host state is still "running" and not "blocked" 
even after 30 minutes. I remember this wasn't the case in the not far away past...  am I doing 
something wrong? don't tell me its RTFM...

I got this behaviour on both RHEL5.4 and 2.6.33 with iscsi/tcp and iser. I just 
run this without multipathing around. When multipathing was used (RHEL 5.4) the 
multipath driver was aware that the device is done as of the
path checker timeout. see more details on my config in the attach.


The host state is never going to change for this. The scsi host and session state are completely different. It is just due the weirdness of iscsi_tcp and ib_iser that we do a host per session. Normally drivers do a host per pci resource, and so if one session goes down the entire host state is not going to change due to just the one session down down because the host can recover the session while leaving the host in a normal state.

Although, the scsi_error.c eh is overly careful and will bring down the entire host when a cmd times out and that is when you would see the host state go into recovery. As we have discussed before the scsi eh has to support lots of simple drivers that cannot handle the eh runinng while running normal IO so it will stop the entire host. This is why we for session level recovery we try to prevent the scsi eh from running.

When the session initially goes down and is cleaned up (when iscsi_conn_stop is called) it blocks the session. That puts the devices in the blocked state. Then when the recovery/replacemnt timeout fires and you see

session3: session recovery timed out after 15 secs

The iscsi layer unblocks the devices (puts them in running state) and fails all IO to the devices.

On linux-scsi we have discussed that we need a new state for this. Something like the offline state. I have to reread the thread on why people do not like offline for that state (I think becuase offline means offlined due to the scsi eh failing).

--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to