On Sat, Aug 1, 2009 at 6:34 AM, Mike Christie<[email protected]> wrote:
> Mike Christie wrote:
>> On 07/31/2009 04:03 AM, Hannes Reinecke wrote:
>>> Mike Christie wrote:
>>>> tcp_sendpages/tcp_sendmsg can wait sndtmo seconds
>>>> if a connection goes bad. This then delays session
>>>> recovery, because that code must wait for the xmit
>>>> thread to flush. OTOH, if we did not wait at all
>>>> we are less efficient in the lock management
>>>> because we must reacquire the session lock every
>>>> time the network layer returns ENOBUFS and runs
>>>> iscsi_tcp's write_space callout.
>>>>
>>>> This tries to balance the two by reducing the
>>>> wait to 3 seconds from 15. If we have waited 3 secs
>>>> to send a pdu then perf is already taking a hit so
>>>> grabbing the session lock again is not going to make a
>>>> difference. And waiting up to 3 secs for the xmit thread
>>>> to flush and suspend is not that long (at least a lot better
>>>> than 15).
>>>>
>>> :-)
>>>
>>> Cool. I'm running with 1 sec here, but the principle is
>>> the same. Especially for a multipathed setup you really
>>> want this.
>>>
>>> Oh, what about making this setting dependend on the
>>> transport class timeout?
>>> Worst case sendpages/sendmsg will take up to 3 seconds
>>> now before it even will return an error.
>>> So having a transport class timeout lower than that
>>> is pointless as we have no means of terminating
>>> a call being stuck in sendpages/sendmsg and the
>>> transport class will always terminate the command.
>>>
>>> So we should either limit the transport class timeout
>>> to not being able to be set lower than 3 seconds or
>>> make this timeout set by the transport class timeout.
>>>
>>
>> Good point! Let me backout my patch, and do some more digging on why I
>> cannot just do
>>
>> signal(xmit thread)
>>
>> to wake it from sendpage/sendmsg right away.
>>
>> If I cannot get that to work, then I will send a patch to implement what
>> you describe.
>
> I got the signal stuff working. I am attaching the patch here. I put it
> in my iscsi branch, because it is built over some other patches I sent
> Erez in his logout takes ~50 secs thread.
>

I'm running with open-iscsi.git HEAD + the check suspend bit patch +
the wake xmit on error patch. If I disconnect the cable on the
initiator side (even while not running IO), I see that after sending
the signal, the  iscsi_q_XX thread reaches 100% cpu. I ran it over
several 1GB/ 10 GB drivers and got the same results.

If I remove the  wake xmit on error patch, I don't see this behavior.

Erez

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to