Re: How to resurrect SCSI device

Székelyi Szabolcs Wed, 02 Mar 2011 04:23:08 -0800

Hi Mike,

On Wednesday 02 March 2011 03:59:39 Mike Christie wrote:
> On 03/01/2011 12:18 PM, Székelyi Szabolcs wrote:
> > I'm facing a somewhat strange situation. As a part of testing a multipath
> > solution, I've written a script that simulates the failure and recovery
> > of one path which happens to be an iSCSI connection. It's running on the
> > target side, periodically stopping and starting it. After some
> > successful failovers and failbacks the SCSI device on the initiator side
> > seems to block all operations on the low level (eg. non-multipathed)
> > device. However the iSCSI session seems to reestablish properly. I'm
> > seeking your advice about how to get this device to work again. I don't
> > even know what can cause such a problem. Is it the SCSI layer, the
> > initiator or the target? I'm quite sure that a relogin could solve this,
> > but I'd like to avoid that if possible.
> > 
> > I'm running Open-iSCSI 2.0.871.3, kernel 2.6.32 as distributed in Debian
> > Squeeze on x86-64 hardware on both sides, with IET 1.4.20.2 as the
> > target.
> 
> > Regarding the SCSI device in question, I see log entries like this:
> Do you have the time stamps for those errors?


I've run the test again, here are the log entries with timestamps.

Mar  2 12:21:02 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:07 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:12 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:17 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:22 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:25 debian kernel: [601122.678387]  connection3:0: detected conn 
error (1020)
Mar  2 12:21:27 debian iscsid: Kernel reported iSCSI connection 3:0 error 
(1020) state (3)
Mar  2 12:21:27 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:29 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:21:32 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:33 debian iscsid: connect to [ipaddress]:3260 failed (Connection 
refused)
Mar  2 12:21:37 debian iscsid: connect to [ipaddress]:3260 failed (Connection 
refused)
Mar  2 12:21:37 debian multipathd: sdc: directio checker reports path is down

[Stripped messages identical to the last two]

Mar  2 12:23:18 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:23:22 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:23:22 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:23:26 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:23:26 debian kernel: [601242.928075]  session3: session recovery 
timed out after 120 secs
Mar  2 12:23:27 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:23:29 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:23:32 debian multipathd: sdc: directio checker reports path is down

[Stripped messages identical to the last two]

Mar  2 12:26:15 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:26:17 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:19 debian iscsid: connection3:0 is operational after recovery (78 
attempts)
Mar  2 12:26:22 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:27 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:32 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:37 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:42 debian multipathd: sdc: directio checker reports path is down

You can see that I've shut the target down at 12:21:25, the recovery times out 
two minutes later as expected (at 12:23:26). I've restarted the target at 
12:26:19, and the session is reestablished, but the SCSI device still fails to 
work as reported by multipathd.

> Is everything going through 1 session?

No, it's two separate sessions to two separate machines. The backend storage 
is the same. I'm trying to do multipath to a storage area that's on a shared 
storage, served by two iSCSI gateways.

> At this point the iscsi layer should now be up. If you run
> 
> iscsiadm -m session -P 3
> 
> does the session state indicate logged in, and does the device states
> indicated "running"?

Yes, the session state is LOGGED_IN and the device state is running.

> > multipathd: sdc: directio checker reports path is down
> 
> If this happened around the same time as the recovery message before it,
> it could have been a race.

No, this keeps on repeating infinitely after the session is reestablished as 
you can see above.

> If at this point you send your own IO using SG/passthrough (something
> like sg_turs /dev/sdc) does that succeed or fail?

It blocks forever. I can't stop it even with SIGTERM. It succeds on the other 
path.

Cheers,
-- 
cc

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: How to resurrect SCSI device

Reply via email to