A few weeks ago one of my five rsync backup processes that runs from a
daily cron 03:00 am has started failing with this exact issue.

I can only manually reproduce it when executing my rsync command with
tripe verbosity:

rsync -avvv --delete /source_dir /target_dir

Manually calling rsync with less than triple verbosity runs the whole
process as expected. However the hanging command used in cron is

$ rsync -aq --delete /source_dir /target_dir > /dev/null

$ strace -p 16857
16857 16:38:37 lstat("Saved/SavedArks/retracted.profilebak", 
{st_mode=S_IFREG|0600, st_size=22986, ...}) = 0
16857 16:38:37 write(1, "[sender] 
make_file(Saved/SavedArks/retracted.profilebak,*,2)\n", 69) = 69
16857 16:38:37 lstat("Saved/SavedArks/retracted.arkprofile", 
{st_mode=S_IFREG|0600, st_size=1810, ...}) = 0
16857 16:38:37 write(1, "[sender] 
make_file(Saved/SavedArks/retracted.arkprofile,*,2)\n", 69) = 69
16857 16:38:37 select(5, [], [4], [], {60, 0} <unfinished ...>
16858 16:39:36 <... select resumed> )   = 0 (Timeout)
16858 16:39:36 select(2, [], [1], [], {60, 0} <unfinished ...>
16859 16:39:37 <... select resumed> )   = 0 (Timeout)
16859 16:39:37 select(5, [], [4], [], {60, 0} <unfinished ...>
16857 16:39:37 <... select resumed> )   = 0 (Timeout)
16857 16:39:37 select(5, [], [4], [], {60, 0}

$ lsof -p 15678 -ad 4,5 # note: I used multiple processes over multiple days, 
the PID will not be constant
COMMAND   PID  USER   FD   TYPE             DEVICE SIZE/OFF  NODE NAME
rsync   15678 pamel    4u  unix 0xffff8f88f9fec400      0t0 44294 type=STREAM
rsync   15678 pamel    5u  unix 0xffff8f88f9614800      0t0 44295 type=STREAM

$ ls -l /proc/15678/fd/4
lrwx------ 1 pamel pamel 64 Mar  6 13:35 /proc/15678/fd/4 -> socket:[44294]
$ ls -l /proc/15678/fd/5
lrwx------ 1 pamel pamel 64 Mar  6 13:35 /proc/15678/fd/5 -> socket:[44295]

$ ss -nxp | grep -E '44294|44295'
u_str  ESTAB      0      59136     * 44296                 * 44295              
 users:(("rsync",pid=15679,fd=1))
u_str  ESTAB      0      97024     * 44294                 * 44293              
 users:(("rsync",pid=15678,fd=4))
u_str  ESTAB      82063  0         * 44293                 * 44294              
 users:(("rsync",pid=15680,fd=0))
u_str  ESTAB      53885  0         * 44295                 * 44296              
 users:(("rsync",pid=15678,fd=5))

I have this issue on Ubuntu 17.04 with rsync 3.1.2, Debian 9.3 with
rsync 3.1.2 and 3.1.3 (all as VM on ESXi 6.5). The only constants are
the files that are being transferred: 6.6GB over 11'437 files. I am also
certain that it does not hang on a specific file as it's sometimes a log
file and sometimes a profile. I would say it happens 0.5s into the
process after transferring about 44 files.

$ rsync -avvv --delete /srv/ark/redacted/ShooterGame/Saved 
/srv/ark/backup/redacted/
[...]
[sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2)
[sender] make_file(Saved/SavedArks/redacted.arktribe,*,2)
[sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2)
send_files(29, /srv/ark/redacted/ShooterGame/Saved/Logs)
Saved/Logs/
send_files(30, 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log)
send_files mapped 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
 of size 65472
calling match_sums 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
^C
[...]
recv_file_name(Saved/SavedArks/redacted.profilebak)
recv_file_name(Saved/SavedArks/redacted.arkprofile)
recv_file_name(Saved/SavedArks/redacted.arkprofile)
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(504) 
[generator=3.1.3]
[generator] _exit_cleanup(code=20, file=io.c, line=504): about to call exit(20)
rsync: [generator] write error: Broken pipe (32)

$ md5sum 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
52596de045ba1759c68dfb8c1cd2d970  
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log

$ id
uid=1001(pamel) gid=1001(pamel) groups=1001(pamel),60(games)
$ ls -lA 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
-rw------- 1 pamel pamel 65472 Apr  1  2017 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
$ ls -lA /srv/ark/backup/
total 4
[...]
drwxr-xr-x 3 pamel pamel 4096 Mar  6 13:25 redacted

I've found other occurences of this issue but mostly people say "this
sounds like a network issue". I'm only rsyncing between two local
directories so I really hope it has nothing to do with network.

Is there anything else I can do to further debug this issue? I can't
really send 6.6GB of confidential data to replicate this issue and it's
a really big issue that now has started to occur daily and requires me
to manually abort the process and restart the services every morning.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1528921

Title:
  rsync hangs on select(5, [], [4], [], {60, 0}

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsync/+bug/1528921/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to