Hi Mike, On Wednesday 02 March 2011 03:59:39 Mike Christie wrote: > On 03/01/2011 12:18 PM, Székelyi Szabolcs wrote: > > I'm facing a somewhat strange situation. As a part of testing a multipath > > solution, I've written a script that simulates the failure and recovery > > of one path which happens to be an iSCSI connection. It's running on the > > target side, periodically stopping and starting it. After some > > successful failovers and failbacks the SCSI device on the initiator side > > seems to block all operations on the low level (eg. non-multipathed) > > device. However the iSCSI session seems to reestablish properly. I'm > > seeking your advice about how to get this device to work again. I don't > > even know what can cause such a problem. Is it the SCSI layer, the > > initiator or the target? I'm quite sure that a relogin could solve this, > > but I'd like to avoid that if possible. > > > > I'm running Open-iSCSI 2.0.871.3, kernel 2.6.32 as distributed in Debian > > Squeeze on x86-64 hardware on both sides, with IET 1.4.20.2 as the > > target. > > > Regarding the SCSI device in question, I see log entries like this: > Do you have the time stamps for those errors?
I've run the test again, here are the log entries with timestamps. Mar 2 12:21:02 debian multipathd: sdc: directio checker reports path is down Mar 2 12:21:07 debian multipathd: sdc: directio checker reports path is down Mar 2 12:21:12 debian multipathd: sdc: directio checker reports path is down Mar 2 12:21:17 debian multipathd: sdc: directio checker reports path is down Mar 2 12:21:22 debian multipathd: sdc: directio checker reports path is down Mar 2 12:21:25 debian kernel: [601122.678387] connection3:0: detected conn error (1020) Mar 2 12:21:27 debian iscsid: Kernel reported iSCSI connection 3:0 error (1020) state (3) Mar 2 12:21:27 debian multipathd: sdc: directio checker reports path is down Mar 2 12:21:29 debian iscsid: connect to 193.225.36.17:3260 failed (Connection refused) Mar 2 12:21:32 debian multipathd: sdc: directio checker reports path is down Mar 2 12:21:33 debian iscsid: connect to [ipaddress]:3260 failed (Connection refused) Mar 2 12:21:37 debian iscsid: connect to [ipaddress]:3260 failed (Connection refused) Mar 2 12:21:37 debian multipathd: sdc: directio checker reports path is down [Stripped messages identical to the last two] Mar 2 12:23:18 debian iscsid: connect to 193.225.36.17:3260 failed (Connection refused) Mar 2 12:23:22 debian iscsid: connect to 193.225.36.17:3260 failed (Connection refused) Mar 2 12:23:22 debian multipathd: sdc: directio checker reports path is down Mar 2 12:23:26 debian iscsid: connect to 193.225.36.17:3260 failed (Connection refused) Mar 2 12:23:26 debian kernel: [601242.928075] session3: session recovery timed out after 120 secs Mar 2 12:23:27 debian multipathd: sdc: directio checker reports path is down Mar 2 12:23:29 debian iscsid: connect to 193.225.36.17:3260 failed (Connection refused) Mar 2 12:23:32 debian multipathd: sdc: directio checker reports path is down [Stripped messages identical to the last two] Mar 2 12:26:15 debian iscsid: connect to 193.225.36.17:3260 failed (Connection refused) Mar 2 12:26:17 debian multipathd: sdc: directio checker reports path is down Mar 2 12:26:19 debian iscsid: connection3:0 is operational after recovery (78 attempts) Mar 2 12:26:22 debian multipathd: sdc: directio checker reports path is down Mar 2 12:26:27 debian multipathd: sdc: directio checker reports path is down Mar 2 12:26:32 debian multipathd: sdc: directio checker reports path is down Mar 2 12:26:37 debian multipathd: sdc: directio checker reports path is down Mar 2 12:26:42 debian multipathd: sdc: directio checker reports path is down You can see that I've shut the target down at 12:21:25, the recovery times out two minutes later as expected (at 12:23:26). I've restarted the target at 12:26:19, and the session is reestablished, but the SCSI device still fails to work as reported by multipathd. > Is everything going through 1 session? No, it's two separate sessions to two separate machines. The backend storage is the same. I'm trying to do multipath to a storage area that's on a shared storage, served by two iSCSI gateways. > At this point the iscsi layer should now be up. If you run > > iscsiadm -m session -P 3 > > does the session state indicate logged in, and does the device states > indicated "running"? Yes, the session state is LOGGED_IN and the device state is running. > > multipathd: sdc: directio checker reports path is down > > If this happened around the same time as the recovery message before it, > it could have been a race. No, this keeps on repeating infinitely after the session is reestablished as you can see above. > If at this point you send your own IO using SG/passthrough (something > like sg_turs /dev/sdc) does that succeed or fail? It blocks forever. I can't stop it even with SIGTERM. It succeds on the other path. Cheers, -- cc -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
