On 08/05/2009 11:01 AM, Erez Zilber wrote:
> On Wed, Aug 5, 2009 at 6:19 PM, Mike Christie<micha...@cs.wisc.edu>  wrote:
>> On 08/04/2009 01:12 PM, Erez Zilber wrote:
>>> On Tue, Aug 4, 2009 at 8:17 PM, Mike Christie<micha...@cs.wisc.edu>    
>>> wrote:
>>>> Erez Zilber wrote:
>>>>> I'm running with open-iscsi.git HEAD + the check suspend bit patch +
>>>>> the wake xmit on error patch. If I disconnect the cable on the
>>>>> initiator side (even while not running IO), I see that after sending
>>>>> the signal, the  iscsi_q_XX thread reaches 100% cpu. I ran it over
>>>>> several 1GB/ 10 GB drivers and got the same results.
>>>>> If I remove the  wake xmit on error patch, I don't see this behavior.
>>>> Shoot, I have been running the xmit wakeup and suspend bit patch here
>>>> fine. Let me do some more testing.
>>>> Is this something you always hit? Could you send me the final patch you
>>>> ended up using?
>>> I see this every time. Note that I'm not running with
>>> linux-2.6-iscsi.git. I'm using the open-iscsi.git tree + the 2 patches
>>> that I took without any change (using git-show) from the
>>> linux-2.6-iscsi.git tree. Which tree did you test it on?
>>> I added some printks to the code and saw that the signal does get sent
>>> from iscsi_sw_tcp_conn_stop, but I didn't see that (rc == -EINTR || rc
>>> == -EAGAIN) in  iscsi_sw_tcp_xmit (), even when I ran IO on that
>>> session.
>> Does r in iscsi_sw_tcp_xmit_segment == 0?
> No, it is never zero.
>> If not I think you need a diffferent patch. In one of the patch versions
>> iscsi_sw_tcp_xmit_segment could return -ENODATA (this is when I had a
>> check for suspend_tx in there). iscsi_sw_tcp_xmit did not check this and
>> so I think  we can loop.
>> Could you try the attached patch. It was made over open-iscsi.git for
>> you. I dropped the suspend bit check in iscsi_sw_tcp_xmit_segment,
>> because it is not needed. If we end up blocking the signal will wake us.
> I ran it and got the same 100% cpu usage. Did you try to run it on
> your machines with open-iscsi.git? Did you see a different behavior?

I just ran it. Maybe I am looking for the wrong thing though.

For your problem, when the signal is sent does the recovery go ok and we 
end up reconnecting? But the problem is just that the xmit thread takes 
up 100% of the cpu?


For your problem, when the signal is sent does the recovery stall and we 
do not reconnect, because the xmit thread is just spinning and taking 
100% of the cpu?

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
For more options, visit this group at http://groups.google.com/group/open-iscsi

Reply via email to