Re: How to resurrect SCSI device

2011-03-02 Thread Székelyi Szabolcs
Hi Mike,

On Wednesday 02 March 2011 03:59:39 Mike Christie wrote:
 On 03/01/2011 12:18 PM, Székelyi Szabolcs wrote:
  I'm facing a somewhat strange situation. As a part of testing a multipath
  solution, I've written a script that simulates the failure and recovery
  of one path which happens to be an iSCSI connection. It's running on the
  target side, periodically stopping and starting it. After some
  successful failovers and failbacks the SCSI device on the initiator side
  seems to block all operations on the low level (eg. non-multipathed)
  device. However the iSCSI session seems to reestablish properly. I'm
  seeking your advice about how to get this device to work again. I don't
  even know what can cause such a problem. Is it the SCSI layer, the
  initiator or the target? I'm quite sure that a relogin could solve this,
  but I'd like to avoid that if possible.
  
  I'm running Open-iSCSI 2.0.871.3, kernel 2.6.32 as distributed in Debian
  Squeeze on x86-64 hardware on both sides, with IET 1.4.20.2 as the
  target.
 
  Regarding the SCSI device in question, I see log entries like this:
 Do you have the time stamps for those errors?

I've run the test again, here are the log entries with timestamps.

Mar  2 12:21:02 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:07 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:12 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:17 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:22 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:25 debian kernel: [601122.678387]  connection3:0: detected conn 
error (1020)
Mar  2 12:21:27 debian iscsid: Kernel reported iSCSI connection 3:0 error 
(1020) state (3)
Mar  2 12:21:27 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:29 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:21:32 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:21:33 debian iscsid: connect to [ipaddress]:3260 failed (Connection 
refused)
Mar  2 12:21:37 debian iscsid: connect to [ipaddress]:3260 failed (Connection 
refused)
Mar  2 12:21:37 debian multipathd: sdc: directio checker reports path is down

[Stripped messages identical to the last two]

Mar  2 12:23:18 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:23:22 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:23:22 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:23:26 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:23:26 debian kernel: [601242.928075]  session3: session recovery 
timed out after 120 secs
Mar  2 12:23:27 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:23:29 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:23:32 debian multipathd: sdc: directio checker reports path is down

[Stripped messages identical to the last two]

Mar  2 12:26:15 debian iscsid: connect to 193.225.36.17:3260 failed 
(Connection refused)
Mar  2 12:26:17 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:19 debian iscsid: connection3:0 is operational after recovery (78 
attempts)
Mar  2 12:26:22 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:27 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:32 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:37 debian multipathd: sdc: directio checker reports path is down
Mar  2 12:26:42 debian multipathd: sdc: directio checker reports path is down

You can see that I've shut the target down at 12:21:25, the recovery times out 
two minutes later as expected (at 12:23:26). I've restarted the target at 
12:26:19, and the session is reestablished, but the SCSI device still fails to 
work as reported by multipathd.

 Is everything going through 1 session?

No, it's two separate sessions to two separate machines. The backend storage 
is the same. I'm trying to do multipath to a storage area that's on a shared 
storage, served by two iSCSI gateways.

 At this point the iscsi layer should now be up. If you run
 
 iscsiadm -m session -P 3
 
 does the session state indicate logged in, and does the device states
 indicated running?

Yes, the session state is LOGGED_IN and the device state is running.

  multipathd: sdc: directio checker reports path is down
 
 If this happened around the same time as the recovery message before it,
 it could have been a race.

No, this keeps on repeating infinitely after the session is reestablished as 
you can see above.

 If at this point you send your own IO using SG/passthrough (something
 like sg_turs /dev/sdc) does that succeed or fail?

It blocks forever. I can't stop it even with SIGTERM. It succeds on the other 
path.

Cheers,
-- 
cc

-- 
You 

Re: How to resurrect SCSI device

2011-03-02 Thread Mark Lehrer
I'm facing a somewhat strange situation. As a part of testing a 
multipath solution, I've written a script that simulates the failure and


This is an excellent thread, thanks.

Does this happen even without multipathd?  I have been trying to work on a 
procedure to fix a similar problem where the device is read-only even though 
the session was re-established - and there is no way to either unmount it or 
get it back to a state where it is usable.


Mark

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



On vacation until Tuesday

2011-03-02 Thread Mike Christie

Hey List,

I am going to be on vacation until Tues. I will not have access to 
email, so if I do not respond like usual you know why.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.