On Sat, Aug 1, 2009 at 6:34 AM, Mike Christie<[email protected]> wrote: > Mike Christie wrote: >> On 07/31/2009 04:03 AM, Hannes Reinecke wrote: >>> Mike Christie wrote: >>>> tcp_sendpages/tcp_sendmsg can wait sndtmo seconds >>>> if a connection goes bad. This then delays session >>>> recovery, because that code must wait for the xmit >>>> thread to flush. OTOH, if we did not wait at all >>>> we are less efficient in the lock management >>>> because we must reacquire the session lock every >>>> time the network layer returns ENOBUFS and runs >>>> iscsi_tcp's write_space callout. >>>> >>>> This tries to balance the two by reducing the >>>> wait to 3 seconds from 15. If we have waited 3 secs >>>> to send a pdu then perf is already taking a hit so >>>> grabbing the session lock again is not going to make a >>>> difference. And waiting up to 3 secs for the xmit thread >>>> to flush and suspend is not that long (at least a lot better >>>> than 15). >>>> >>> :-) >>> >>> Cool. I'm running with 1 sec here, but the principle is >>> the same. Especially for a multipathed setup you really >>> want this. >>> >>> Oh, what about making this setting dependend on the >>> transport class timeout? >>> Worst case sendpages/sendmsg will take up to 3 seconds >>> now before it even will return an error. >>> So having a transport class timeout lower than that >>> is pointless as we have no means of terminating >>> a call being stuck in sendpages/sendmsg and the >>> transport class will always terminate the command. >>> >>> So we should either limit the transport class timeout >>> to not being able to be set lower than 3 seconds or >>> make this timeout set by the transport class timeout. >>> >> >> Good point! Let me backout my patch, and do some more digging on why I >> cannot just do >> >> signal(xmit thread) >> >> to wake it from sendpage/sendmsg right away. >> >> If I cannot get that to work, then I will send a patch to implement what >> you describe. > > I got the signal stuff working. I am attaching the patch here. I put it > in my iscsi branch, because it is built over some other patches I sent > Erez in his logout takes ~50 secs thread. >
I'm running with open-iscsi.git HEAD + the check suspend bit patch + the wake xmit on error patch. If I disconnect the cable on the initiator side (even while not running IO), I see that after sending the signal, the iscsi_q_XX thread reaches 100% cpu. I ran it over several 1GB/ 10 GB drivers and got the same results. If I remove the wake xmit on error patch, I don't see this behavior. Erez --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---
