On Wed, Aug 5, 2009 at 7:45 PM, Mike Christie<micha...@cs.wisc.edu> wrote:
> On 08/05/2009 11:33 AM, Mike Christie wrote:
>> On 08/05/2009 11:26 AM, Mike Christie wrote:
>>> On 08/05/2009 11:01 AM, Erez Zilber wrote:
>>>> On Wed, Aug 5, 2009 at 6:19 PM, Mike Christie<micha...@cs.wisc.edu>    
>>>> wrote:
>>>>> On 08/04/2009 01:12 PM, Erez Zilber wrote:
>>>>>> On Tue, Aug 4, 2009 at 8:17 PM, Mike Christie<micha...@cs.wisc.edu>      
>>>>>> wrote:
>>>>>>> Erez Zilber wrote:
>>>>>>>> I'm running with open-iscsi.git HEAD + the check suspend bit patch +
>>>>>>>> the wake xmit on error patch. If I disconnect the cable on the
>>>>>>>> initiator side (even while not running IO), I see that after sending
>>>>>>>> the signal, the  iscsi_q_XX thread reaches 100% cpu. I ran it over
>>>>>>>> several 1GB/ 10 GB drivers and got the same results.
>>>>>>>> If I remove the  wake xmit on error patch, I don't see this behavior.
>>>>>>> Shoot, I have been running the xmit wakeup and suspend bit patch here
>>>>>>> fine. Let me do some more testing.
>>>>>>> Is this something you always hit? Could you send me the final patch you
>>>>>>> ended up using?
>>>>>> I see this every time. Note that I'm not running with
>>>>>> linux-2.6-iscsi.git. I'm using the open-iscsi.git tree + the 2 patches
>>>>>> that I took without any change (using git-show) from the
>>>>>> linux-2.6-iscsi.git tree. Which tree did you test it on?
>>>>>> I added some printks to the code and saw that the signal does get sent
>>>>>> from iscsi_sw_tcp_conn_stop, but I didn't see that (rc == -EINTR || rc
>>>>>> == -EAGAIN) in  iscsi_sw_tcp_xmit (), even when I ran IO on that
>>>>>> session.
>>>>> Does r in iscsi_sw_tcp_xmit_segment == 0?
>>>> No, it is never zero.
>>>>> If not I think you need a diffferent patch. In one of the patch versions
>>>>> iscsi_sw_tcp_xmit_segment could return -ENODATA (this is when I had a
>>>>> check for suspend_tx in there). iscsi_sw_tcp_xmit did not check this and
>>>>> so I think  we can loop.
>>>>> Could you try the attached patch. It was made over open-iscsi.git for
>>>>> you. I dropped the suspend bit check in iscsi_sw_tcp_xmit_segment,
>>>>> because it is not needed. If we end up blocking the signal will wake us.
>>>> I ran it and got the same 100% cpu usage. Did you try to run it on
>>>> your machines with open-iscsi.git? Did you see a different behavior?
>>> I just ran it. Maybe I am looking for the wrong thing though.
>>> For your problem, when the signal is sent does the recovery go ok and we
>>> end up reconnecting? But the problem is just that the xmit thread takes
>>> up 100% of the cpu?
>> Ignore this. I see the problem now. I was thinking you did not
>> reconnect. I see the cpu usage. Let me do some digging.
> I found it. The problem is that we will send the signal if the xmit
> thread is running or not. If it is not running the workqueue code will
> keep getting woken up to handle the signal, but because we have not
> called queue_work the workqueue code will not let the thread run so we
> never get to flush the signal until we reconnect and send down a login
> pdu (the login pdu does a queue_work finally).

When you say "the xmit thread is running", I guess that you mean that
the xmit thread is busy with IO, right? Note that I said that this
happens whether I'm running IO or everything's idle. 2 more thing that
I forgot to mention:

1. I didn't try to reconnect the cable (actually, I disabled the port
in the switch) and see if the problem goes away.
2. When I logout (while the port is stil disconnected), everything
goes back to normal, but I guess that this is because the xmit thread


You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
For more options, visit this group at http://groups.google.com/group/open-iscsi

Reply via email to