On Tue, Jun 23, 2009 at 8:29 PM, Mike Christie<micha...@cs.wisc.edu> wrote:
>
> Erez Zilber wrote:
>> Mike,
>>
>> I'm trying to debug a problem that we have with iscsiadm: I'm running
>> open-iscsi against multiple targets. At some point, I'm closing the
>> connection from one of the targets (i.e. on the target side). Then, I
>> try to logout from the initiator side, but something goes wrong. The
>> last thing that iscsiadm does it call recv from iscsid_response and it
>> doesn't return (at least not after 10 minutes). I also see that in the
>> kernel, __iscsi_unbind_session calls scsi_remove_target and doesn't
>> return. I guess that this causes iscsiadm to wait on the recv call.
>
> Yeah, iscsiadm will wait for the iscsid operations like the unind to
> complete, and that can take a while.
>
> If you stop the target and then we start the session shutdown process
> while we still think the session is up (we have not got a tcp connection
> error or rst or any other indication that is bad like a nop timing out),
> then we are going to end up firing the iscsi or scsi eh.
>
> If you have IO running or if your LU requires a cache sync to be sent
> when shutting it down, then the worse case is that you have nops turned
> off, and for some reason the network layer does not return a error (just
> returns somehting we thing is retryable like EAGAIN) when we try to do
> sendpage/sendmsg. This will result in the scsi commands timing out. Then
> the aborts and other tmfs will timeout, and then we will wait for
> replacement_timeout seconds to try and reconnect.
>
> If you have nops on or the net layer returns a error, it would be a
> little faster because you do not have to wait for scsi commands to time
> out. The nop will timeout after noop_timeout seconds, then we will wait
> for replacement_timeout seconds to reconnect. After that time we will
> fail everything.
>
> if you do not have IO running and your device does not require cache
> syncs, then it should be a lot shorter, but still may be a minute. The
> __iscsi_unbind_session/scsi_remove_target should complete quickly since
> they do not have to wait on IO and cache syncs to complete. We would
> just wait for the logout iscsi pdu to timeout.
>
>
> There is also a bug, where we retry the sending of data even though we
> know the connection is bad. This patch helps
> http://git.kernel.org/?p=linux/kernel/git/mnc/linux-2.6-iscsi.git;a=commit;h=b138adb2df49967bf0a035143f734d33c4263963
> but what we want is to be able to break from the sendpage/sendsg wait. I
> am working on a patch, but have hit some problems (for some reason if I
> send a signal it does not break from the wait). This problem only adds
> maybe 30 seconds extra for the logout of a session, so I am not sure
> that is what you are hitting.
>
>
>
> So first check if your device needs a cache sync. You can check that by
> looking at /var/log/messages when the device is discovered. You will see
>  something like:
>
> kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
>
> If write cache is enabled then the scsi layer will send cache syncs.
>
> Then check your replacement_timeout. If that is really long, then we
> might be hitting that.
>
>
>
>
>>
>> BTW - I'm not running with the latest code. My HEAD is commit
>> ef0357c4728ebba1a4b91a7f6d69c729a5f9e6e3. I don't know if any relevant
>> bug fixes were made lately.
>
>
>
> Just so you know, I normally work on linux-2.6-iscsi, which tracks
> upstream, then port to open-iscsi/kernel, so the newest kernel patches
> will be in there.

Eventually, it was caused by an internal bug that we had. After fixing
it, things look OK. Thanks for your help.

Erez

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to