A few weeks ago one of my five rsync backup processes that runs from a
daily cron 03:00 am has started failing with this exact issue.

I can only manually reproduce it when executing my rsync command with
tripe verbosity:

rsync -avvv --delete /source_dir /target_dir

Manually calling rsync with less than triple verbosity runs the whole
process as expected. However the hanging command used in cron is

$ rsync -aq --delete /source_dir /target_dir > /dev/null

$ strace -p 16857
16857 16:38:37 lstat("Saved/SavedArks/retracted.profilebak", 
{st_mode=S_IFREG|0600, st_size=22986, ...}) = 0
16857 16:38:37 write(1, "[sender] 
make_file(Saved/SavedArks/retracted.profilebak,*,2)\n", 69) = 69
16857 16:38:37 lstat("Saved/SavedArks/retracted.arkprofile", 
{st_mode=S_IFREG|0600, st_size=1810, ...}) = 0
16857 16:38:37 write(1, "[sender] 
make_file(Saved/SavedArks/retracted.arkprofile,*,2)\n", 69) = 69
16857 16:38:37 select(5, [], [4], [], {60, 0} <unfinished ...>
16858 16:39:36 <... select resumed> )   = 0 (Timeout)
16858 16:39:36 select(2, [], [1], [], {60, 0} <unfinished ...>
16859 16:39:37 <... select resumed> )   = 0 (Timeout)
16859 16:39:37 select(5, [], [4], [], {60, 0} <unfinished ...>
16857 16:39:37 <... select resumed> )   = 0 (Timeout)
16857 16:39:37 select(5, [], [4], [], {60, 0}

$ lsof -p 15678 -ad 4,5 # note: I used multiple processes over multiple days, 
the PID will not be constant
COMMAND   PID  USER   FD   TYPE             DEVICE SIZE/OFF  NODE NAME
rsync   15678 pamel    4u  unix 0xffff8f88f9fec400      0t0 44294 type=STREAM
rsync   15678 pamel    5u  unix 0xffff8f88f9614800      0t0 44295 type=STREAM

$ ls -l /proc/15678/fd/4
lrwx------ 1 pamel pamel 64 Mar  6 13:35 /proc/15678/fd/4 -> socket:[44294]
$ ls -l /proc/15678/fd/5
lrwx------ 1 pamel pamel 64 Mar  6 13:35 /proc/15678/fd/5 -> socket:[44295]

$ ss -nxp | grep -E '44294|44295'
u_str  ESTAB      0      59136     * 44296                 * 44295              
 users:(("rsync",pid=15679,fd=1))
u_str  ESTAB      0      97024     * 44294                 * 44293              
 users:(("rsync",pid=15678,fd=4))
u_str  ESTAB      82063  0         * 44293                 * 44294              
 users:(("rsync",pid=15680,fd=0))
u_str  ESTAB      53885  0         * 44295                 * 44296              
 users:(("rsync",pid=15678,fd=5))

I have this issue on Ubuntu 17.04 with rsync 3.1.2, Debian 9.3 with
rsync 3.1.2 and 3.1.3 (all as VM on ESXi 6.5). The only constants are
the files that are being transferred: 6.6GB over 11'437 files. I am also
certain that it does not hang on a specific file as it's sometimes a log
file and sometimes a profile. I would say it happens 0.5s into the
process after transferring about 44 files.

$ rsync -avvv --delete /srv/ark/redacted/ShooterGame/Saved 
/srv/ark/backup/redacted/
[...]
[sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2)
[sender] make_file(Saved/SavedArks/redacted.arktribe,*,2)
[sender] make_file(Saved/SavedArks/redacted.arkprofile,*,2)
send_files(29, /srv/ark/redacted/ShooterGame/Saved/Logs)
Saved/Logs/
send_files(30, 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log)
send_files mapped 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
 of size 65472
calling match_sums 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
^C
[...]
recv_file_name(Saved/SavedArks/redacted.profilebak)
recv_file_name(Saved/SavedArks/redacted.arkprofile)
recv_file_name(Saved/SavedArks/redacted.arkprofile)
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(504) 
[generator=3.1.3]
[generator] _exit_cleanup(code=20, file=io.c, line=504): about to call exit(20)
rsync: [generator] write error: Broken pipe (32)

$ md5sum 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
52596de045ba1759c68dfb8c1cd2d970  
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log

$ id
uid=1001(pamel) gid=1001(pamel) groups=1001(pamel),60(games)
$ ls -lA 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
-rw------- 1 pamel pamel 65472 Apr  1  2017 
/srv/ark/redacted/ShooterGame/Saved/Logs/ServerGame.10035.2016.10.27_07.23.01.log
$ ls -lA /srv/ark/backup/
total 4
[...]
drwxr-xr-x 3 pamel pamel 4096 Mar  6 13:25 redacted

I've found other occurences of this issue but mostly people say "this
sounds like a network issue". I'm only rsyncing between two local
directories so I really hope it has nothing to do with network.

Is there anything else I can do to further debug this issue? I can't
really send 6.6GB of confidential data to replicate this issue and it's
a really big issue that now has started to occur daily and requires me
to manually abort the process and restart the services every morning.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsync in Ubuntu.
https://bugs.launchpad.net/bugs/1528921

Title:
  rsync hangs on select(5, [], [4], [], {60, 0}

Status in rsync package in Ubuntu:
  Confirmed

Bug description:
  In the last few months my home directory backup stopped completing.
  I've been able to reproduce the problem on a single subdirectory
  although I had to add the --debug=all flag to reproduce it on that
  smaller directory.  Specifically, this command never completes:

  rsync --debug=all -avz /tmp/html2 /tmp/rsynctest/

  The html2 directory is a copy of
  gnuradio-3.7.8.1/build/docs/doxygen/html .

  When I strace the command, I see this:
  write(1, "sender finished /tmp/html2/atsc_"..., 58sender finished 
/tmp/html2/atsc__interleaver_8h__incl.md5
  ) = 58
  write(1, "send_files(338, /tmp/html2/atsc_"..., 59send_files(338, 
/tmp/html2/atsc__interleaver_8h__incl.png)
  ) = 59
  open("html2/atsc__interleaver_8h__incl.png", O_RDONLY|O_LARGEFILE) = 3
  fstat64(3, {st_mode=S_IFREG|0664, st_size=264657, ...}) = 0
  write(1, "html2/atsc__interleaver_8h__incl"..., 
37html2/atsc__interleaver_8h__incl.png
  ) = 37
  read(3, 
"\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\n\253\0\0\2\233\10\6\0\0\0h\242\""..., 
262144) = 262144
  select(6, [5], [4], [5], {60, 0})       = 2 (in [5], out [4], left {59, 
999996})
  read(5, 
"\0\0\0\0\0\0\0\1\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\240\0\0\0"..., 95) 
= 95
  write(4, 
"r\311\0\7\177\377\232\237\264\272e\300\300\240\316\264&\314\301\252*\37\256y\225g\373^\315j\370\350"...,
 51574) = 51574
  select(5, [], [4], [], {60, 0})         = 1 (out [4], left {59, 999997})
  write(4, 
"\7\320\0\7\177\377\234|\7X\223Y\273\255c\27\25f\306\212\202\214#E\272\212t\1\225A\fU"...,
 53259) = 53259
  select(5, [], [4], [], {60, 0}

  The select command times out over and over.  I get the same behavior
  when trying to back up my entire home directory but I don't need the
  --debug=all flag in that case.


  lsb_release -rd
  Description:    Ubuntu 14.04.3 LTS
  Release:        14.04

  apt-cache policy rsync
  rsync:
    Installed: 3.1.0-2ubuntu0.1
    Candidate: 3.1.0-2ubuntu0.1
    Version table:
   *** 3.1.0-2ubuntu0.1 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main i386 
Packages
          500 http://security.ubuntu.com/ubuntu/ trusty-security/main i386 
Packages
          100 /var/lib/dpkg/status
       3.1.0-2 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty/main i386 Packages

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: rsync 3.1.0-2ubuntu0.1
  ProcVersionSignature: Ubuntu 3.13.0-74.118-generic 3.13.11-ckt30
  Uname: Linux 3.13.0-74-generic i686
  NonfreeKernelModules: nvidia
  ApportVersion: 2.14.1-0ubuntu3.19
  Architecture: i386
  CurrentDesktop: KDE
  Date: Wed Dec 23 09:44:17 2015
  EcryptfsInUse: Yes
  InstallationDate: Installed on 2010-09-18 (1922 days ago)
  InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Beta i386 (20100901.1)
  SourcePackage: rsync
  UpgradeStatus: Upgraded to trusty on 2014-12-27 (361 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsync/+bug/1528921/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to