On 07/23/2009 12:24 PM, Mike Christie wrote:
> But you still might be hitting a problem where the target does not like 
> data-outs when it closed the window. Maybe they interpreted the RFC 
> differently. You should ask the HP target guys for more info.
> Also your patch might be working because I think it ends up throttling 
> the connection, so IO does not timeout because pipes are backed up (the 
> slow down from the throttling is one of the problems we hit with the 
> patch I did before which was pretty much the same as you posted).

I think that what Hannes patch is doing is exactly that off-by-one command.
So maybe you are right and it is an HP bug where the window check is
off-by-one or they do not like data-outs after window close.

Try to compare less-one in queuecommand and see if it helps the same?

> I think I can replicate this problem now too. It was by accident. I am 
> using a EQL target remotely (I am in the middle of the US and the target 
> is on the west coast so there is a good deal of space between us and the 
> connection is slow) and I am seeing the problem where the network layer 
> is just not taking any more data so eventually something times out (if I 
> turn off nops then scsi command timer fires and if I also increase that 
> to 10 minutes then the EQL target will actually send me a nop and I 
> cannot send that because the network layer just keeps returning -AGAIN). 
> Are you still seeing that problem? Basically sendpage/sendmsg just keeps 
> returning -EGAIN. We even get woken up by iscsi_sw_tcp_write_space, but 
> the sk_wmem_queued and sk_sndbuf values are basically stuck and so no 
> space ever opens up for some reason (I attached the debug patch I am using).

I've used in the passed tgt with open-iscsi over an internet connection
from Israel to US and it did work. Like a simple mount of ext3 and some
read writes. But I've never put it into heavy load. Do you see this
problem only on an heavy load or a single long dd will cause it?

> I tried your patch hoping it might help, but it does not help for this 
> problem here. Maybe it is different issues.


