Hi again, You said: "I then see something is trying to delete the session (the upstream iscsi tools would normally not do this). [...]It then looks like we hit a bug in the scsi layer."
We DO perform iscsiadm commands like "logout" and "delete" (these commands are done automatically). Do you think that if we stop doing them (or change our usage somehow) then this will solve the problem? Changing how we use open-iscsi is definitely an option for us as a solution for this issue. -----Original Message----- From: Cale, Yonatan Sent: Tuesday, April 22, 2014 12:11 PM To: 'Mike Christie' Cc: [email protected]; [email protected] Subject: RE: Target reboot -> iscsiadm rescan Stuck Hi Mike, Answers are in your message below: -----Original Message----- From: Mike Christie [mailto:[email protected]] Sent: Tuesday, April 22, 2014 12:38 AM To: Cale, Yonatan Cc: [email protected]; [email protected] Subject: Re: Target reboot -> iscsiadm rescan Stuck >Do you have some module that is hooking into the scsi layer or iscsi modules? >Just wondering what the "sim_try_to_abort_cmd" call is. Where are you hooking >in? "sim" is our module that handles iscsi data-path. We hook for notifications in order to know if we should cancel a command (we didn't find this in open-iscsi, this is a little off-topic, but does open-iscsi know how to abort commands by itself?) I think this sould have nothing to do with the iscsiadm control path. I'll verify this with our sim guys (they are on vacation). >Have you also modified the iscsi tools? No. I'll verify this too, but it's highly unlikely. >During this test, is the target able to respond to iscsi level IO, but just >not scsi commands? I see iscsi nops are successful, but the scsi scan related >commands like REPORT_LUNS are never replied to by the target. The scsi error >handler then runs. Some aborts work, but eventually we do not get a response >to one, and that results in the device getting offlined. I then see something >is trying to delete the session (the upstream iscsi tools would normally not >do this). The target is a VNX being (partially) rebooted. I don't know for sure what it can/can't do during that reboot, but I think it can't answer NOP commands either: I did some investigation of the tcpdump I sent you, and I can see that at some point (packet #1682, time~=38sec), there are no more packets FROM 10.76.18.23, which is the IP of the VNX-SP that I am rebooting. This includes NOP iSCSI commands which are also not sent anymore. >It then looks like we hit a bug in the scsi layer. The scsi layer keeps trying >to send a inquiry, but because we are deleting the session, the iscsi layer >fails the command with DID_TRANSPORT_FAILFAST. This then goes on for minutes >until you stop taking the trace. >To debug this some more, we will need to get some scsi layer tracing. >Did you run that scsi_logging_level command? I did " scsi_logging_level --scan 7 --error 7 -s" (Unless I made a mistake? Do you think this command wasn't run?) >Could you also check the current kernel? I don't understand what you are asking for. The kernel version? 3.0.56 -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
