On Wed, Aug 5, 2009 at 7:45 PM, Mike Christie<micha...@cs.wisc.edu> wrote: > > On 08/05/2009 11:33 AM, Mike Christie wrote: >> On 08/05/2009 11:26 AM, Mike Christie wrote: >>> On 08/05/2009 11:01 AM, Erez Zilber wrote: >>>> On Wed, Aug 5, 2009 at 6:19 PM, Mike Christie<micha...@cs.wisc.edu> >>>> wrote: >>>>> On 08/04/2009 01:12 PM, Erez Zilber wrote: >>>>>> On Tue, Aug 4, 2009 at 8:17 PM, Mike Christie<micha...@cs.wisc.edu> >>>>>> wrote: >>>>>>> Erez Zilber wrote: >>>>>>>> I'm running with open-iscsi.git HEAD + the check suspend bit patch + >>>>>>>> the wake xmit on error patch. If I disconnect the cable on the >>>>>>>> initiator side (even while not running IO), I see that after sending >>>>>>>> the signal, the iscsi_q_XX thread reaches 100% cpu. I ran it over >>>>>>>> several 1GB/ 10 GB drivers and got the same results. >>>>>>>> >>>>>>>> If I remove the wake xmit on error patch, I don't see this behavior. >>>>>>>> >>>>>>> Shoot, I have been running the xmit wakeup and suspend bit patch here >>>>>>> fine. Let me do some more testing. >>>>>>> >>>>>>> Is this something you always hit? Could you send me the final patch you >>>>>>> ended up using? >>>>>> I see this every time. Note that I'm not running with >>>>>> linux-2.6-iscsi.git. I'm using the open-iscsi.git tree + the 2 patches >>>>>> that I took without any change (using git-show) from the >>>>>> linux-2.6-iscsi.git tree. Which tree did you test it on? >>>>>> >>>>>> I added some printks to the code and saw that the signal does get sent >>>>>> from iscsi_sw_tcp_conn_stop, but I didn't see that (rc == -EINTR || rc >>>>>> == -EAGAIN) in iscsi_sw_tcp_xmit (), even when I ran IO on that >>>>>> session. >>>>>> >>>>> Does r in iscsi_sw_tcp_xmit_segment == 0? >>>>> >>>> No, it is never zero. >>>> >>>>> If not I think you need a diffferent patch. In one of the patch versions >>>>> iscsi_sw_tcp_xmit_segment could return -ENODATA (this is when I had a >>>>> check for suspend_tx in there). iscsi_sw_tcp_xmit did not check this and >>>>> so I think we can loop. >>>>> >>>>> Could you try the attached patch. It was made over open-iscsi.git for >>>>> you. I dropped the suspend bit check in iscsi_sw_tcp_xmit_segment, >>>>> because it is not needed. If we end up blocking the signal will wake us. >>>> I ran it and got the same 100% cpu usage. Did you try to run it on >>>> your machines with open-iscsi.git? Did you see a different behavior? >>>> >>> I just ran it. Maybe I am looking for the wrong thing though. >>> >>> For your problem, when the signal is sent does the recovery go ok and we >>> end up reconnecting? But the problem is just that the xmit thread takes >>> up 100% of the cpu? >>> >> >> >> Ignore this. I see the problem now. I was thinking you did not >> reconnect. I see the cpu usage. Let me do some digging. >> > > I found it. The problem is that we will send the signal if the xmit > thread is running or not. If it is not running the workqueue code will > keep getting woken up to handle the signal, but because we have not > called queue_work the workqueue code will not let the thread run so we > never get to flush the signal until we reconnect and send down a login > pdu (the login pdu does a queue_work finally). >
When you say "the xmit thread is running", I guess that you mean that the xmit thread is busy with IO, right? Note that I said that this happens whether I'm running IO or everything's idle. 2 more thing that I forgot to mention: 1. I didn't try to reconnect the cable (actually, I disabled the port in the switch) and see if the problem goes away. 2. When I logout (while the port is stil disconnected), everything goes back to normal, but I guess that this is because the xmit thread dies. Erez --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---