RE: Target reboot -> iscsiadm rescan Stuck

Cale, Yonatan Fri, 25 Apr 2014 21:20:24 -0700

Hi,

About moving to a newer kernel, this is an issue for us and we'd rather avoid 
it. I can tell you that if you didn't reproduce the bug, it's not necessarily 
because you have a different kernel. This bug is not very easy to reproduce. It 
happens ~sometimes~ (about 1/3 of the times), and it seems to happen with 
"VNX5300 - 05.32.000.5.208" (and not another VNX we have). I can't define/prove 
the specific version/behavior that is needed to reproduce this but I definitely 
understand if it doesn't happen in your labs.


I will run the test again with the " scsi_logging_level --scan 7 --error 7 -s". 
Are the logs supposed to come out in dmesg?

Btw, I talked to our "sim" module guys, no changes from what I said before.

-----Original Message-----
From: Mike Christie [mailto:[email protected]] 
Sent: Thursday, April 24, 2014 9:11 AM
To: Cale, Yonatan
Cc: [email protected]; [email protected]
Subject: Re: Target reboot -> iscsiadm rescan Stuck

On 04/23/2014 02:35 AM, Cale, Yonatan wrote:
> Hi again,
> 
> You said: "I then see something is trying to delete the session (the upstream 
> iscsi tools would normally not do this). [...]It then looks like we hit a bug 
> in the scsi layer."
> 
> We DO perform iscsiadm commands like "logout" and "delete" (these commands 
> are done automatically).
> Do you think that if we stop doing them (or change our usage somehow) then 
> this will solve the problem?

If it fixes the problem then it is just working around some issue in the scsi 
layer. I am not sure since we need some more scsi layer logging as it does not 
look like a iscsi layer issue.

Have you tried more recent kernels btw? I have tested the offline/eh running 
type of case you tested with upstream and 3.8 and it works ok for me. I am not 
able to replicate the exact failure where a scsi scan related command is the 
one that times out.

> Changing how we use open-iscsi is definitely an option for us as a solution 
> for this issue.
> 
> -----Original Message-----
> From: Cale, Yonatan
> Sent: Tuesday, April 22, 2014 12:11 PM
> To: 'Mike Christie'
> Cc: [email protected]; [email protected]
> Subject: RE: Target reboot -> iscsiadm rescan Stuck
> 
> Hi Mike,
> Answers are in your message below:
> 
> -----Original Message-----
> From: Mike Christie [mailto:[email protected]]
> Sent: Tuesday, April 22, 2014 12:38 AM
> To: Cale, Yonatan
> Cc: [email protected]; [email protected]
> Subject: Re: Target reboot -> iscsiadm rescan Stuck
> 
>> Do you have some module that is hooking into the scsi layer or iscsi 
>> modules? Just wondering what the "sim_try_to_abort_cmd" call is. Where are 
>> you hooking in? 
> "sim" is our module that handles iscsi data-path. We hook for notifications 
> in order to know if we should cancel a command (we didn't find this in 
> open-iscsi, this is a little off-topic, but does open-iscsi know how to abort 
> commands by itself?) I think this sould have nothing to do with the iscsiadm 
> control path. I'll verify this with our sim guys (they are on vacation).
> 
>> Have you also modified the iscsi tools?
> No. I'll verify this too, but it's highly unlikely.
> 
>> During this test, is the target able to respond to iscsi level IO, but just 
>> not scsi commands? I see iscsi nops are successful, but the scsi scan 
>> related commands like REPORT_LUNS are never replied to by the target. The 
>> scsi error handler then runs. Some aborts work, but eventually we do not get 
>> a response to one, and that results in the device getting offlined. I then 
>> see something is trying to delete the session (the upstream iscsi tools 
>> would normally not do this).
> The target is a VNX being (partially) rebooted. I don't know for sure what it 
> can/can't do during that reboot, but I think it can't answer NOP commands 
> either:
> I did some investigation of the tcpdump I sent you, and I can see that at 
> some point (packet #1682, time~=38sec), there are no more packets FROM 
> 10.76.18.23, which is the IP of the VNX-SP that I am rebooting. This includes 
> NOP iSCSI commands which are also not sent anymore.
> 
>> It then looks like we hit a bug in the scsi layer. The scsi layer keeps 
>> trying to send a inquiry, but because we are deleting the session, the iscsi 
>> layer fails the command with DID_TRANSPORT_FAILFAST. This then goes on for 
>> minutes until you stop taking the trace.
> 
>> To debug this some more, we will need to get some scsi layer tracing.
>> Did you run that scsi_logging_level command? 
> I did " scsi_logging_level --scan 7 --error 7 -s" (Unless I made a 
> mistake? Do you think this command wasn't run?)
> 
>> Could you also check the current kernel?
> I don't understand what you are asking for. The kernel version? 3.0.56
> 


-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

RE: Target reboot -> iscsiadm rescan Stuck

Reply via email to