I've got an rsync job which is consistently failing, but I've been
unable to diagnose the problem.  FAQ/Google/docs/etc. checked and
no luck.

Basically, it looks like the rsync process invoked on the far end
is exiting, and then the local process waits until the timeout and 
exits.

Both systems are Sun boxes, Ultra 10 or better with 256+ MB of memory.
Rsync version is 2.5.0 on the local end and 2.5.5 on the remote end.
Network pipe between the two is 768KB VPN WAN.  On the local side, 
here's
what I see:

Begin job 02-tomove-hpx at Tue Jun 18 10:13:36 2002
Executing /somepath/rsync -z -v --exclude=.snapshot 
--exclude=lost+found --archive --delete --force 
--rsync-path=/usr/local/bin/rsync  /some/path/ 
[EMAIL PROTECTED]:/another/path/
         building file list ... done

On the remote end, looking with truss -vpoll -p:

lstat64("toolbox/shaperouter.mgc_shaperouter.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/shaperouter/shaperouter.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/spicenet2G6", 0xFFBEFAE0)      = 0
lstat64("toolbox/spicenet2G6", 0xFFBEF1D8)      = 0
lstat64("toolbox/spicenet2G6.SpiceNet2G6.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/spicenet2G6/spicenet2G6.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/srp", 0xFFBEFAE0)              = 0
lstat64("toolbox/srp", 0xFFBEF1D8)              = 0
lstat64("toolbox/srp.mgc_srp_tool.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/srp/srp.qual", 0xFFBEFAE0)     = 0
lstat64("toolbox/test_fablink", 0xFFBEFAE0)     = 0
lstat64("toolbox/test_fablink", 0xFFBEF1D8)     = 0
lstat64("toolbox/test_fablink.mgc_test_fablink.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/test_fablink/test_fablink.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/test_layout", 0xFFBEFAE0)      = 0
lstat64("toolbox/test_layout", 0xFFBEF1D8)      = 0
lstat64("toolbox/test_layout.mgc_test_layout.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/test_layout/test_layout.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/to_layout", 0xFFBEFAE0)        = 0
lstat64("toolbox/to_layout", 0xFFBEF1D8)        = 0
lstat64("toolbox/to_layout.to_layout_tvpt.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/to_layout/to_layout.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/vnet", 0xFFBEFAE0)             = 0
lstat64("toolbox/vnet", 0xFFBEF1D8)             = 0
lstat64("toolbox/vnet.VNet.attr", 0xFFBEFAE0)   = 0
lstat64("toolbox/vnet/vnet.qual", 0xFFBEFAE0)   = 0
poll(0xFFBEE7E0, 2, 60000)                      = 1
        fd=1  ev=POLLOUT rev=POLLOUT
        fd=8  ev=POLLRDNORM rev=0
write(1, "04\0\007FFFFFFFF", 8)                 = 8
poll(0xFFBEF4D0, 2, 60000)                      = 1
        fd=6  ev=POLLRDNORM rev=POLLRDNORM
        fd=8  ev=POLLRDNORM rev=0
read(6, "FFFFFFFF", 4)                          = 4
poll(0xFFBEE850, 2, 60000)                      = 1
        fd=1  ev=POLLOUT rev=POLLOUT
        fd=8  ev=POLLRDNORM rev=0
write(1, "04\0\007FFFFFFFF", 8)                 = 8
poll(0xFFBEF540, 2, 60000)                      = 1
        fd=6  ev=POLLRDNORM rev=POLLRDNORM
        fd=8  ev=POLLRDNORM rev=0
read(6, "01\0\0\0", 4)                          = 4
close(6)                                        = 0
poll(0xFFBEE938, 2, 60000)                      = 1
        fd=1  ev=POLLOUT rev=POLLOUT
        fd=8  ev=POLLRDNORM rev=0
write(1, "04\0\007FFFFFFFF", 8)                 = 8
kill(18231, SIGUSR2)                            = 0
waitid(P_PID, 18231, 0xFFBEFB08, WEXITED|WTRAPPED|WNOHANG) = 0
     Received signal #18, SIGCLD, in poll() [caught]
       siginfo: SIGCLD CLD_EXITED pid=18231 status=0x0000
poll(0xFFBEFAE8, 0, 20)                         Err#4 EINTR
waitid(P_ALL, 0, 0xFFBEF620, WEXITED|WTRAPPED|WNOHANG) = 0
waitid(P_ALL, 0, 0xFFBEF620, WEXITED|WTRAPPED|WNOHANG) Err#10 ECHILD
setcontext(0xFFBEF7D0)
poll(0xFFBEFAE8, 0, 16)                         = 0
waitid(P_PID, 18231, 0xFFBEFB08, WEXITED|WTRAPPED|WNOHANG) Err#10 ECHILD
sigaction(SIGUSR1, 0xFFBEFB48, 0xFFBEFBC8)      = 0
sigaction(SIGUSR2, 0xFFBEFB48, 0xFFBEFBC8)      = 0
llseek(0, 0, SEEK_CUR)                          Err#9 EBADF
_exit(0)
bash-2.03$ 
The destination directory has free space.  I have a job between the same
hosts (different paths) that executes successfully just before this job.
This job fails consistently, but not always after the same file lstat. 
I have tried disabling -z, using --bwlimit, disabling -v, using -vvvvv,
all to no avail.  Also tried changing the local end of the rsync to a
different system.  I still need to try moving the far end, but I do get
a similar problem on a completely different rsync to a different host
(same source).

I can provide additonal details if needed.  Any help greatly 
appreciated.

        - Lee
        [EMAIL PROTECTED]

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html

Reply via email to