I think I've tracked down another hang in rsync 2.4.6.  This one appears
to be caused by the sender process finishing up all its work and going
into a pid-reading loop before it finishes reading all the error stream
coming in from the generator process -- if this data is large enough,
the generator hangs waiting for the sender to read the data.  A simple
way to reproduct the hang is to simply rsync a local copy of the Linux
source (e.g. 2.4.3) to another local dir with options -av.

The fix appears to be to put the read of the final goodbye message
before the sender begins its pid-wait loop.  This allows the error
stream to flush, and things don't hang.  Here's what I changed:

Index: main.c
--- main.c      29 May 2001 14:37:54 -0000      1.127
+++ main.c      5 Jun 2001 09:30:44 -0000
@@ -504,15 +504,15 @@
                        rprintf(FINFO,"file list sent\n");

                send_files(flist,f_out,f_in);
+               if (remote_version >= 24) {
+                       /* final goodbye message */
+                       read_int(f_in);
+               }
                if (pid != -1) {
                        if (verbose > 3)
                                rprintf(FINFO,"client_run waiting on %d\n",pid);
                        io_flush();
                        wait_process(pid, &status);
-               }
-               if (remote_version >= 24) {
-                       /* final goodbye message */
-                       read_int(f_in);
                }
                report(-1);
                exit_cleanup(status);

I'd appreciate it if someone more familiar with this code would take a
look at this to see if there might be any unforseen problems with this
change.

I've also refined my previous anti-hang patch some more since I noticed
that in a really rare circumstance it could cause the buffered redo
bytes to get read in the wrong order (for this to happen the input
buffer had to be empty, and some error output had to arrive at the same
time as some redo bytes and we had to be in the read function reading
the raw redo-channel fd -- when all that came together, bytes would get
written down the pipe to the sender causing redo bytes to get buffered
and the following read of the redo fd would read some bytes in the wrong
order).  While I was at it I also made my input-buffer code more
efficient in certain boundary cases (it might do too much memcpy-ing if
the buffer was nearly full and the read and write calls started to
alternate).

My latest anti-hang changes (including the change above) can be grabbed
from here:

    http://www.clari.net/~wayne/rsync-nohang.patch

This is relative to the CVS source, and replaces my previous patches.

..wayne..


Reply via email to