Re: [PATCH 4/4] libiscsi: Flush workqueue before suspend

Mike Christie Tue, 18 Aug 2009 10:19:09 -0700

On 08/17/2009 01:31 AM, Ulrich Windl wrote:
> On 14 Aug 2009 at 15:08, Hannes Reinecke wrote:
>
>> Ulrich Windl wrote:
>>> On 13 Aug 2009 at 11:55, Mike Christie wrote:
>>>
>>>> some additions:
>>>>
>>>>
>>>> Mike Christie wrote:
>>>>> wait_on_commands()
>>>>> {
>>>>    while (check_restrictions(conn->task)&&
>>>>            session->state == LOGGED_IN)
>>>>            wait for a while
>>> "wait for a while" always looks like a design flaw to me. Wait for what?
>>>
>> For the xmit thread to catch up. Due to the asynchronous nature the
>> xmit thread is independent on the error handler thread, so the
>> eh thread has to be synchronized here until the xmit thread has processed
>> all outstanding commands.
>
> But who makes sure "a while" is long enough? It seems you are just releasing a
> mutex lock, then re-aquire it. If the lock does fair queuing, you don't need 
> to
> wait; you'd just have to do something like "sched_yield()" (for kernel 
> tasks). If
> I understood it well, you want to enforce a "task switch" inside the kernel.
> Right?




The problem with Hanne's simple approach is that:

1. we can get transient xmit_task task errors while trying to send a 
pdu. His patches would allow the xmit thread to return from 
iscsi_data_xmit on these errors leaving a partial pdu and possibly other 
pdus behind it to be sent.

2. moving the bit setting and work queue flush is not right, because 
callers assume that when iscsi_suspend_tx finishes the xmit thread will 
not be touching any tasks or xmit resources. With the simple patch 
Hannes did, all tasks could get sent, flush_workqueue could complete, 
then right before the suspend_bit is set, another task could be queued 
and the work queue could start up and start sending the task. Then the 
suspend bit could get set and iscsi_suspend_tx would return and the 
caller could start breaking down the connection while the xmit thread is 
using it.


So my proposal is to just wait for the tasks that are queued and 
affected by the TMF to get fully sent. Note, at this time, we do not 
allow new commands that could be affected by the TMF to be started.

The eh thread sits around and every second would check if the tasks that 
were affected by the tmf but still needed to send data, have sent all 
their data. Either they will eventually send all their data, or the user 
will logout the session or there will be some connection error that 
kills the session.

The wait for a while can be a simple msleep, ssleep, etc. Check out 
drivers like qla2xxx or lpfc for examples if you want to steal a loop 
and wait.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Re: [PATCH 4/4] libiscsi: Flush workqueue before suspend

Reply via email to