On 03/01/2011 12:18 PM, Székelyi Szabolcs wrote:
Hi all,

I'm facing a somewhat strange situation. As a part of testing a multipath
solution, I've written a script that simulates the failure and recovery of one
path which happens to be an iSCSI connection. It's running on the target side,
periodically stopping and starting it. After some successful failovers and
failbacks the SCSI device on the initiator side seems to block all operations
on the low level (eg. non-multipathed) device. However the iSCSI session seems
to reestablish properly. I'm seeking your advice about how to get this device
to work again. I don't even know what can cause such a problem. Is it the SCSI
layer, the initiator or the target? I'm quite sure that a relogin could solve
this, but I'd like to avoid that if possible.

I'm running Open-iSCSI 2.0.871.3, kernel 2.6.32 as distributed in Debian
Squeeze on x86-64 hardware on both sides, with IET 1.4.20.2 as the target.

Regarding the SCSI device in question, I see log entries like this:


Do you have the time stamps for those errors?

Is everything going through 1 session?

kernel: [538607.929428]  connection3:0: detected conn error (1020)
iscsid: Kernel reported iSCSI connection 3:0 error (1020) state (3)

Here we get an indication the target dropped the tcp/ip connection. The iscsi layer will block the session - basically tell the scsi/block layer to queue IO until the iscsi layer decides what to do next.

kernel: [538728.180083]  session3: session recovery timed out after 120 secs

At this point the session has been down for 2 minutes. During this time the iscsi layer has been trying to reconnect but could not. So the iscsi layer tells the scsi/block layer to fail IO it has had queued and fail any new IO.

iscsid: connect to [ipaddress]:3260 failed (Connection refused)
multipathd: sdc: directio checker reports path is down
iscsid: connect to 193.225.36.16:3260 failed (Connection refused)
iscsid: connection3:0 is operational after recovery (34 attempts)

At this point the iscsi layer should now be up. If you run

iscsiadm -m session -P 3

does the session state indicate logged in, and does the device states indicated "running"?

multipathd: sdc: directio checker reports path is down

If this happened around the same time as the recovery message before it, it could have been a race.

If at this point you send your own IO using SG/passthrough (something like sg_turs /dev/sdc) does that succeed or fail?

--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to