Anthony C wrote:
> I am running heavy load test to an array connected to SLES10 SP2
> system ( using open-iscsi and device-mapper multipath
> that comes with the distro.
> The test from time to time will also inititate lun reset.  For
> whatever reason, after couple hours I/O would just hang on a path.
> And iscsiadm -m session -P3 would show that a path is in "recovery"
> state (see below) and while other other paths are "running" state
>         Current Portal:,1
>         Persistent Portal:,1
>                 **********
>                 Interface:
>                 **********
>                 Iface Name: default
>                 Iface Transport: tcp
>                 Iface Initiatorname:
> 01:7284eb499690
>                 Iface IPaddress:
>                 Iface HWaddress: default
>                 Iface Netdev: default
>                 SID: 1
>                 iSCSI Connection State: LOGGED IN
>                 iSCSI Session State: Unknown
>                 Internal iscsid Session State: NO CHANGE
>                 ************************
>                 Negotiated iSCSI params:
>                 ************************
>                 HeaderDigest: None
>                 DataDigest: None
>                 MaxRecvDataSegmentLength: 131072
>                 MaxXmitDataSegmentLength: 524288
>                 FirstBurstLength: 262144
>                 MaxBurstLength: 2097152
>                 ImmediateData: No
>                 InitialR2T: Yes
>                 MaxOutstandingR2T: 1
>                 ************************
>                 Attached SCSI devices:
>                 ************************
>                 Host Number: 4  State: recovery

What is the lun state that is output for each device/path right after this?

> So while other paths are deem good by iscsi and multipathd agrees
> according to output in /var/log/messages, no test I/O nor multipathd
> check I/O is going out to the wire on the so-called "bad" path.  It
> seems they're being held back and never completed.

When you see the Host state as recovery it means that a scsi command has 
timedout at the scsi layer, and that the scsi layer has started its 
error handler. If you do

cat /sys/class/scsi_host/host4/host_busy

you can see how many commands are stuck in recovery.

At this time the scsi layer will have the driver try to abort and retry 
each command that is outstanding. If that fails the scsi layer will have 
the driver do a lun reset. And if that fails the scsi layer will have us 
do a host reset, which the driver drops the session then tries to 
relogin. If we try to drop and relogin, then these vales

 >                 iSCSI Connection State: LOGGED IN
 >                 iSCSI Session State: Unknown
 >                 Internal iscsid Session State: NO CHANGE

Would indicate that we trying to login and the session is not logged in. 
The device states (the output you did not include) would be blocked).

If the recovery/replacement timeout eventually fires while we are trying 
to log back in, this will signal to the driver to give up and in this 
case the Host state will be online, but the devices will show offline, 
and the iscsi conn/session states above will indicate that we are in a 
failed state (the internal iscsid state will actually show that it is 
still trying to log back in because it is in case the connection does 
come back).

> On the trace, on the "bad" path the only I/O is iSCSI nop .  So who is
> holding back all the I/O?  scsi mid-layer or iscsi or both?

So it is both. The scsi layer initially blocks up io, but if aborts, and 
lun resets failed, then the iscsi layer will block things up until the 
replacement/recovery timeout has fired.

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at

Reply via email to