Hi again,

You said: "I then see something is trying to delete the session (the upstream 
iscsi tools would normally not do this). [...]It then looks like we hit a bug 
in the scsi layer."

We DO perform iscsiadm commands like "logout" and "delete" (these commands are 
done automatically).
Do you think that if we stop doing them (or change our usage somehow) then this 
will solve the problem?
Changing how we use open-iscsi is definitely an option for us as a solution for 
this issue.

-----Original Message-----
From: Cale, Yonatan 
Sent: Tuesday, April 22, 2014 12:11 PM
To: 'Mike Christie'
Cc: [email protected]; [email protected]
Subject: RE: Target reboot -> iscsiadm rescan Stuck

Hi Mike,
Answers are in your message below:

-----Original Message-----
From: Mike Christie [mailto:[email protected]]
Sent: Tuesday, April 22, 2014 12:38 AM
To: Cale, Yonatan
Cc: [email protected]; [email protected]
Subject: Re: Target reboot -> iscsiadm rescan Stuck

>Do you have some module that is hooking into the scsi layer or iscsi modules? 
>Just wondering what the "sim_try_to_abort_cmd" call is. Where are you hooking 
>in? 
"sim" is our module that handles iscsi data-path. We hook for notifications in 
order to know if we should cancel a command (we didn't find this in open-iscsi, 
this is a little off-topic, but does open-iscsi know how to abort commands by 
itself?) I think this sould have nothing to do with the iscsiadm control path. 
I'll verify this with our sim guys (they are on vacation).

>Have you also modified the iscsi tools?
No. I'll verify this too, but it's highly unlikely.

>During this test, is the target able to respond to iscsi level IO, but just 
>not scsi commands? I see iscsi nops are successful, but the scsi scan related 
>commands like REPORT_LUNS are never replied to by the target. The scsi error 
>handler then runs. Some aborts work, but eventually we do not get a response 
>to one, and that results in the device getting offlined. I then see something 
>is trying to delete the session (the upstream iscsi tools would normally not 
>do this).
The target is a VNX being (partially) rebooted. I don't know for sure what it 
can/can't do during that reboot, but I think it can't answer NOP commands 
either:
I did some investigation of the tcpdump I sent you, and I can see that at some 
point (packet #1682, time~=38sec), there are no more packets FROM 10.76.18.23, 
which is the IP of the VNX-SP that I am rebooting. This includes NOP iSCSI 
commands which are also not sent anymore.

>It then looks like we hit a bug in the scsi layer. The scsi layer keeps trying 
>to send a inquiry, but because we are deleting the session, the iscsi layer 
>fails the command with DID_TRANSPORT_FAILFAST. This then goes on for minutes 
>until you stop taking the trace.

>To debug this some more, we will need to get some scsi layer tracing.
>Did you run that scsi_logging_level command? 
I did " scsi_logging_level --scan 7 --error 7 -s" (Unless I made a mistake? Do 
you think this command wasn't run?)

>Could you also check the current kernel?
I don't understand what you are asking for. The kernel version? 3.0.56

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to