Eric,

This is quite a different condition from the one Dave reported. The
sendq and recvq is zero at _both_ ends, which means that rsync should
exit. It is the exit code that is failing in this case.

> ps  report
> -----------
> root 17408 17407  0 22:00:22 ?        0:19 /usr/local/bin/ssh soho
> rsync --server -vvogDtprz --timeout=3600 --delete --par .......
> root 17407 17406  0 22:00:21 ?        1:55 /usr/local/bin/rsync
> -rptgoD --partial --delete-after -vv --delete -e 
> 
> 
> # truss -p 17407
> poll(0xFFBEFAD0, 0, 1)                          = 0
> waitid(P_PID, 17408, 0xFFBEFB00, WEXITED|WTRAPPED|WNOHANG) = 0
> poll(0xFFBEFAD0, 0, 20)                         = 0
> poll(0xFFBEFAD0, 0, 1)                          = 0
> waitid(P_PID, 17408, 0xFFBEFB00, WEXITED|WTRAPPED|WNOHANG) = 0
> poll(0xFFBEFAD0, 0, 20)                         = 0
> waitid(P_PID, 17408, 0xFFBEFB00, WEXITED|WTRAPPED|WNOHANG) = 0
> poll(0xFFBEFAD0, 0, 20)                         = 0

ok, in this case process 17407 is looping waiting for 17408 (the ssh
process) to exit. The question is why the ssh process doesn't exit. It
will be waiting for the process at the other end of the link to exit
(there is a precise order in which all the processes need to exit).

> RECEIVING SIDE
> ==============
> Solaris7 Rsync 2.4.6
> 
>    Local Address        Remote Address    Swind Send-Q Rwind Recv-Q 
> State
> -------------------- -------------------- ----- ------ ----- ------
> -------
> soho.22              herc.798              8760      0  8760      0
> ESTABLISHED                     
> 
> soho% ps -aef |grep rsync
> (nothing returned)
> 
> NOTE: I see a ssh connection, but no rsync process. 

then this looks like a sshd bug. If the child process of a sshd has
exited then sshd _must_ exit, unless it has unsent data to send, but
we know it doesn't have unsent data as the sendq is zero on that
end. So it must be a sshd bug. It isn't a sshd bug that I've seen
before, so it is important that you try to track it down and report
it. btw, please make absolutely sure that the rsync process really has
exited on the server. If there is a rsync child of sshd still there
then this would completely change the conclusion.

What you need to do is truss sshd and work out why it is hanging
around.

I know I seem to be copping out and redirecting blame away from rsync
all the time, but I'm afraid that is just how it is. rsync does put
unusual stresses on lots of things (ssh, tcp etc) and exposes bugs
quite often. There isn't really anything we can do about that.

Cheers, Tridge

Reply via email to