Hi, About moving to a newer kernel, this is an issue for us and we'd rather avoid it. I can tell you that if you didn't reproduce the bug, it's not necessarily because you have a different kernel. This bug is not very easy to reproduce. It happens ~sometimes~ (about 1/3 of the times), and it seems to happen with "VNX5300 - 05.32.000.5.208" (and not another VNX we have). I can't define/prove the specific version/behavior that is needed to reproduce this but I definitely understand if it doesn't happen in your labs.
I will run the test again with the " scsi_logging_level --scan 7 --error 7 -s". Are the logs supposed to come out in dmesg? Btw, I talked to our "sim" module guys, no changes from what I said before. -----Original Message----- From: Mike Christie [mailto:[email protected]] Sent: Thursday, April 24, 2014 9:11 AM To: Cale, Yonatan Cc: [email protected]; [email protected] Subject: Re: Target reboot -> iscsiadm rescan Stuck On 04/23/2014 02:35 AM, Cale, Yonatan wrote: > Hi again, > > You said: "I then see something is trying to delete the session (the upstream > iscsi tools would normally not do this). [...]It then looks like we hit a bug > in the scsi layer." > > We DO perform iscsiadm commands like "logout" and "delete" (these commands > are done automatically). > Do you think that if we stop doing them (or change our usage somehow) then > this will solve the problem? If it fixes the problem then it is just working around some issue in the scsi layer. I am not sure since we need some more scsi layer logging as it does not look like a iscsi layer issue. Have you tried more recent kernels btw? I have tested the offline/eh running type of case you tested with upstream and 3.8 and it works ok for me. I am not able to replicate the exact failure where a scsi scan related command is the one that times out. > Changing how we use open-iscsi is definitely an option for us as a solution > for this issue. > > -----Original Message----- > From: Cale, Yonatan > Sent: Tuesday, April 22, 2014 12:11 PM > To: 'Mike Christie' > Cc: [email protected]; [email protected] > Subject: RE: Target reboot -> iscsiadm rescan Stuck > > Hi Mike, > Answers are in your message below: > > -----Original Message----- > From: Mike Christie [mailto:[email protected]] > Sent: Tuesday, April 22, 2014 12:38 AM > To: Cale, Yonatan > Cc: [email protected]; [email protected] > Subject: Re: Target reboot -> iscsiadm rescan Stuck > >> Do you have some module that is hooking into the scsi layer or iscsi >> modules? Just wondering what the "sim_try_to_abort_cmd" call is. Where are >> you hooking in? > "sim" is our module that handles iscsi data-path. We hook for notifications > in order to know if we should cancel a command (we didn't find this in > open-iscsi, this is a little off-topic, but does open-iscsi know how to abort > commands by itself?) I think this sould have nothing to do with the iscsiadm > control path. I'll verify this with our sim guys (they are on vacation). > >> Have you also modified the iscsi tools? > No. I'll verify this too, but it's highly unlikely. > >> During this test, is the target able to respond to iscsi level IO, but just >> not scsi commands? I see iscsi nops are successful, but the scsi scan >> related commands like REPORT_LUNS are never replied to by the target. The >> scsi error handler then runs. Some aborts work, but eventually we do not get >> a response to one, and that results in the device getting offlined. I then >> see something is trying to delete the session (the upstream iscsi tools >> would normally not do this). > The target is a VNX being (partially) rebooted. I don't know for sure what it > can/can't do during that reboot, but I think it can't answer NOP commands > either: > I did some investigation of the tcpdump I sent you, and I can see that at > some point (packet #1682, time~=38sec), there are no more packets FROM > 10.76.18.23, which is the IP of the VNX-SP that I am rebooting. This includes > NOP iSCSI commands which are also not sent anymore. > >> It then looks like we hit a bug in the scsi layer. The scsi layer keeps >> trying to send a inquiry, but because we are deleting the session, the iscsi >> layer fails the command with DID_TRANSPORT_FAILFAST. This then goes on for >> minutes until you stop taking the trace. > >> To debug this some more, we will need to get some scsi layer tracing. >> Did you run that scsi_logging_level command? > I did " scsi_logging_level --scan 7 --error 7 -s" (Unless I made a > mistake? Do you think this command wasn't run?) > >> Could you also check the current kernel? > I don't understand what you are asking for. The kernel version? 3.0.56 > -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
