Dave,
> The linux server says
>
> Proto Recv-Q Send-Q Local Address Foreign Address State
> tcp 35230 40686 expbuild.research.:8664 dynamic.ih.lucent:36352 ESTABLISHED
>
> and the solaris client says
>
> Local Address Remote Address Swind Send-Q
>Rwind Recv-Q State
> dynamic.ih.lucent.com.36352 expbuild.research.bell-labs.com.8664 0 1459
>8760 0 ESTABLISHED
ok, if the above condition is not temporary (ie. not just packet loss)
and the cable between the two boxes is OK then this is _definately_ a
OS bug. The job of TCP is to get data from the sendq on one side to
the recvq on the other. The only reason that data would not be sent on
an ESTABLISHED connection is if the window was zero, and you don't get
that with a zero sized recvq.
It is quite impossible for rsync to cause the above condition. The
rsync server has written some data to a socket in the expectation that
it will get to the other end (that's what reliable transports are all
about), but the data hasn't got there.
The next thing you have to do is run a sniffer to determine whether it
is a Solaris or Linux bug. My bet is this will be the same Linux bug
we have observed here. You'll see the Linux box sending data outside
the window that the Solaris box is offering, the Solaris box will
reject that data by sending a ack with the current window and the
Linux box will ignore the hint.
> I very highly doubt a bad network card, however.
When I mentioned bad network cards I meant that bad cards can trigger
the bug, but the bug could be triggered by lots of other
conditions. It's just that with one particularly bad card we have here
we can reproduce it every time with a particular kernel.
> This morning I observed that while one client process was working hard for
> a long time, the other one was indeed idle a lot of the time so I am again
> leaning toward the necessity of Neil Schellenberger's timeout fix. The
> above test was run with --timeout 0.
Neil's analysis is quite plausible and worth looking into but that is
most definately not what is causing the hang you see here.
Cheers, Tridge