On Tue, Jan 09, 2001 at 04:05:44AM -0500, Hal Haygood wrote:
> I'm experiencing a hang with rsync 2.4.6 on Solaris. Inititating and target
> hosts are both Solaris 2.6.
...
> This is for an rsync push of a large directory tree. The command is:
>
> /usr/local/bin/rsync \
> -avzHlW \
> --rsync-path=/usr/local/bin/rsync \
> --timeout=900 \
> --delete \
> --exclude (some excludes here) \
> /local/directory/name/* \
> remotehost:/remote/directory/name
I note that you're using -W which disables the rsync algorithm, thus reducing
the strain caused by back-and-forth traffic but probably increasing the
amount of traffic sent from the local host to the remote host.
...
> The TCP queue on the sending host looks like this:
>
> Local Address Remote Address Swind Send-Q Rwind Recv-Q State
> -------------------- -------------------- ----- ------ ----- ------ -------
> thishost.1018 remotehost.shell 8760 0 0 0 ESTABLISHED
> thishost.1017 remotehost.1022 8760 0 8760 0 ESTABLISHED
>
> The TCP queue on the receiving host looks like this:
>
> Local Address Remote Address Swind Send-Q Rwind Recv-Q State
> -------------------- -------------------- ----- ------ ----- ------ -------
> remotehost.shell thishost.1018 1 0 8760 0 ESTABLISHED
> remotehost.1022 thishost.1017 8760 0 8760 0 ESTABLISHED
Those look good, the send and receive queues are both empty.
...
> The parent rsh process on the sending host is stuck in:
> write(1, " p a r t o f a f i l e n a m e".., 285) (sleeping...)
>
> The child rsh process on the sending host is stuck in:
> read(0, 0xEFFFF410, 1024) (sleeping...)
>
> The "rsync --server" process on the receiving host is stuck in:
> poll(0xEFFFC110, 1, 60000) (sleeping...)
>
> The "csh --c /usr/local/bin/rsync" process on the receiving host is stuck in:
> sigsuspend(0xEFFFF938) (sleeping...)
>
> The "in.rshd" process on the receiving host is stuck in:
> poll(0xEFFFD7F8, 2, -1) (sleeping...)
>
> So, any ideas? Like I said, it looks like write() is blocking for no
> particular reason, and that's causing us to sit and spin.
If the rsh process is stuck in a write and there's nothing in the send
queue it must be some kind of bug inside the Solaris 2.6 kernel on the
sending host. Can you check to see if there are any relevant-looking
patches available from Sun? Maybe Sun customer technical support could
even debug the live system if you give them a call, especially if you
can reproduce it.
If you find any patches from Sun, please let the list know what they are.
- Dave Dykstra