Re: Anti hang comments - unexpected EOF in read_timeout

2001-07-25 Thread Irmund Thum


 Some feedback. I'd been having trouble with rsync hanging or early EOF
 timeouts.
...
 versions over many days and believe that both the kernel upgrade and the
 latest rsync cvs were necessary.
nothing hangs in the case below, the job is done, but what produces the 
message Aborted by user! unexpected EOF in read_timeout ?
Example of a cron job email:
***
Date: Wed, 25 Jul 2001 01:00:01 +0200
From: [EMAIL PROTECTED] (Cron Daemon)
To: [EMAIL PROTECTED]
Subject: Cron root@it97  test -x /usr/lib/cron/run-crons  
/usr/lib/cron/run-crons
X-Cron-Env: SHELL=/bin/sh
X-Cron-Env: PATH=/usr/bin:/usr/sbin:/sbin:/bin:/usr/lib/news/bin
X-Cron-Env: MAILTO=root
X-Cron-Env: HOME=/root
X-Cron-Env: LOGNAME=root
 
Aborted by user!
unexpected EOF in read_timeout
Willkommen auf dem rsync-Server %SERVER
 
receiving file list ... done
ithum/yakumo/.kde/share/apps/kmail/
ithum/yakumo/.kde/share/config/
ithum/yakumo/.kde/share/apps/kmail/ithum:@192.168.111.113:110
ithum/yakumo/.kde/share/config/kdesktoprc
ithum/yakumo/.netscape/cache/index.db
ithum/yakumo/.netscape/history.dat
ithum/yakumo/.kde/share/apps/kmail/
ithum/yakumo/.kde/share/config/
ithum/yakumo/.netscape/
ithum/yakumo/.netscape/cache/
wrote 4564 bytes  read 172792 bytes  6692.68 bytes/sec
total size is 178910233  speedup is 1008.76
***
-- 
--
 Irmund  Thum
+491796998564





Anti hang comments

2001-07-24 Thread John Leach

Some feedback. I'd been having trouble with rsync hanging or early EOF
timeouts.
Problem occurred on 2 identical receiving machines.
I upgraded from Linux kernel 2.4.2 (RH 7.1) to 2.4.7 on the receiving
machines. Sender machines are standard Linux RH 6.1 and RH 6.2 kernel..
Receiving machines are Duron 900Mhz 256 Mb software-raid1 ext2 2x40Gb.
This improved the situation but still gave problems.
Then I downloaded the latest rsync cvs and compiled on both machines and
it's now working perfectly.
Around 10Gb of data to sync.
I tried many permutations of raid/non-raid, kernel 2.4.2/2.4.7 and rsync
versions over many days and believe that both the kernel upgrade and the
latest rsync cvs were necessary.
John Leach
http://osware.net
Melbourne







Re: Anti-hang comments?

2001-07-19 Thread Ville Herva

On Thu, Jul 05, 2001 at 10:58:22AM -0700, you [Jos Backus] claimed:
 On Thu, Jul 05, 2001 at 12:48:06PM -0500, Dave Dykstra wrote:
  If you really want it to stay in the foreground, edit become_daemon in
  socket.c.
 
 It would be nice to have this available as an option so rsyncd can be run
 under djb's daemontools.

I also needed that option to run it as a service under cygwin. I think I
have the patch somewhere, although it is of course trivial to reimplement.


-- v --

[EMAIL PROTECTED]




Re: Anti-hang comments?

2001-07-05 Thread Dave Dykstra

On Thu, Jul 05, 2001 at 12:38:00AM -0500, Phil Howard wrote:
 Wayne Davison wrote:
 
  We certainly do need to be careful here, since the interaction between
  the various read and write functions can be pretty complex.  However, I
  think that the data flow of my move-files patch stress-tests this code
  fairly well, so once we've done some more testing I feel that we will
  not leave rsync worse off than it was before the patch.
  
  Along those lines, I've been testing the new code plus I ported a
  version of my move-files patch on top of it.  The result has a couple
  fewer bugs and seems to be working well so far.
  
  The latest non-expanding-buffer-nohang patch is in the same place:
  
  http://www.clari.net/~wayne/rsync-nohang2.patch
  
  and the new move-files patch that works with nohang2 is here:
  
  http://www.clari.net/~wayne/rsync-move-files2.patch
  
  I'll keep banging on it.  Let me know what you think.
 
 So far it is working for me.  Now I can kill my client side and know
 that my daemon side will properly close down and exit and not leave
 a dangling lock.
 
 But the problem I still have (not quite as bad as before because of
 no more hangs) is that the locks to control the number of daemons is
 still working wrong.  It's still locking the whole lock file instead
 of the first lockable 4 byte record.  I still don't know if it is
 rsync or Linux causing the problem.  The code in both looks right to
 me.  But lslk shows:
 
 SRC   PID  DEV   INUM SZ TY M ST WH END LEN NAME
 rsyncd  24401  3,5 44  0  w 0  0  0   0   0 /tmp/rsyncd.lock
 
 (note, I've been moving the lock file around to see if it might be
 sensitive to filesystem mounting options I'm using, etc).
 
 I'd like to find a way to start rsync in daemon mode AND leave it
 in the foreground so I can run it via strace and maybe see if the
 syscall is being done right.


You shouldn't have to have it be in the foreground in order for strace -f
to work.  I just wrote a test program that verified it:
main()
{
if (fork() == 0)  {
printf(child\n);
setsid();
sleep(10);
printf(bye bye\n);
}
}
strace on that waits until the child process has exitted.

If you really want it to stay in the foreground, edit become_daemon in
socket.c.

- Dave Dykstra




Re: Anti-hang comments?

2001-07-05 Thread Jos Backus

On Thu, Jul 05, 2001 at 12:48:06PM -0500, Dave Dykstra wrote:
 If you really want it to stay in the foreground, edit become_daemon in
 socket.c.

It would be nice to have this available as an option so rsyncd can be run
under djb's daemontools.

-- 
Jos Backus _/  _/_/_/Santa Clara, CA
  _/  _/   _/
 _/  _/_/_/ 
_/  _/  _/_/
[EMAIL PROTECTED] _/_/   _/_/_/use Std::Disclaimer;




Re: Anti-hang comments?

2001-07-05 Thread Phil Howard

Dave Dykstra wrote:

 You shouldn't have to have it be in the foreground in order for strace -f

You're right, I was not aware of that option.  And I thought I
knew my way around strace.

Here's what strace shows me:

[pid 14576] open(/tmp/rsyncd.lock, O_RDWR|O_CREAT|0x8000, 0600) = 4
[pid 14576] fcntl(4, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0

But the source looks just right:

connection.c[39,42]:
/* find a free spot */
for (i=0;imax_connections;i++) {
if (lock_range(fd, i*4, 4)) return 1;
}

util.c[494,506]:
/* lock a byte range in a open file */
int lock_range(int fd, int offset, int len)
{
struct flock lock;

lock.l_type = F_WRLCK;
lock.l_whence = SEEK_SET;
lock.l_start = offset;
lock.l_len = len;
lock.l_pid = 0;

return fcntl(fd,F_SETLK,lock) == 0;
}

I guess maybe there's a library issue involved.  But why it would
stomp on a structure element is unclear.  I'm putting together a
couple new systems with Slackware 8.0 which has glibc 2.2.3 so
I'll probably just first try it on there and see if the problme
persists or not.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: Anti-hang comments?

2001-07-04 Thread Wayne Davison

On Wed, 27 Jun 2001, Martin Pool wrote:
 This is getting disturbingly complex.  I realize the problem is
 complex too, so this is no slur on Wayne's coding.  My gut reaction is
 that if we start adding this then the program's behaviour will become
 even more baroque.

We certainly do need to be careful here, since the interaction between
the various read and write functions can be pretty complex.  However, I
think that the data flow of my move-files patch stress-tests this code
fairly well, so once we've done some more testing I feel that we will
not leave rsync worse off than it was before the patch.

Along those lines, I've been testing the new code plus I ported a
version of my move-files patch on top of it.  The result has a couple
fewer bugs and seems to be working well so far.

The latest non-expanding-buffer-nohang patch is in the same place:

http://www.clari.net/~wayne/rsync-nohang2.patch

and the new move-files patch that works with nohang2 is here:

http://www.clari.net/~wayne/rsync-move-files2.patch

I'll keep banging on it.  Let me know what you think.

..wayne..





Re: Anti-hang comments?

2001-07-04 Thread Phil Howard

Wayne Davison wrote:

 We certainly do need to be careful here, since the interaction between
 the various read and write functions can be pretty complex.  However, I
 think that the data flow of my move-files patch stress-tests this code
 fairly well, so once we've done some more testing I feel that we will
 not leave rsync worse off than it was before the patch.
 
 Along those lines, I've been testing the new code plus I ported a
 version of my move-files patch on top of it.  The result has a couple
 fewer bugs and seems to be working well so far.
 
 The latest non-expanding-buffer-nohang patch is in the same place:
 
 http://www.clari.net/~wayne/rsync-nohang2.patch
 
 and the new move-files patch that works with nohang2 is here:
 
 http://www.clari.net/~wayne/rsync-move-files2.patch
 
 I'll keep banging on it.  Let me know what you think.

So far it is working for me.  Now I can kill my client side and know
that my daemon side will properly close down and exit and not leave
a dangling lock.

But the problem I still have (not quite as bad as before because of
no more hangs) is that the locks to control the number of daemons is
still working wrong.  It's still locking the whole lock file instead
of the first lockable 4 byte record.  I still don't know if it is
rsync or Linux causing the problem.  The code in both looks right to
me.  But lslk shows:

SRC   PID  DEV   INUM SZ TY M ST WH END LEN NAME
rsyncd  24401  3,5 44  0  w 0  0  0   0   0 /tmp/rsyncd.lock

(note, I've been moving the lock file around to see if it might be
sensitive to filesystem mounting options I'm using, etc).

I'd like to find a way to start rsync in daemon mode AND leave it
in the foreground so I can run it via strace and maybe see if the
syscall is being done right.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: Anti-hang comments?

2001-06-27 Thread Martin Pool

On 26 Jun 2001, Wayne Davison [EMAIL PROTECTED] wrote:

 Here's a solution with a non-growing buffer. 

This is getting disturbingly complex.  I realize the problem is
complex too, so this is no slur on Wayne's coding.  My gut reaction is
that if we start adding this then the program's behaviour will become
even more baroque.  I'll read it and see how it goes.

-- 
Martin 




Re: Anti-hang comments?

2001-06-27 Thread Wayne Davison

On Tue, 26 Jun 2001, Wayne Davison wrote:
 Since read_int() is a fairly high-level call, I had to manually ensure
 that a flush doesn't happen and to ensure that reading the redo_fd
 doesn't try to read the io_error_fd (both to avoid nested read
 attempts on the redo_fd).

In case you're wondering where this extra read_int() call is in my
patch, I changed it so that it avoids higher-level calls when handling
the lower-level read functionality.  The patch was updated yesterday
before the first person grabbed a copy, so there's no need for anyone to
re-grab the patch.

..wayne..





Re: Anti-hang comments?

2001-06-26 Thread Wayne Davison

On Mon, 25 Jun 2001, Andrew Tridgell wrote:
 I've applied your simple nohang patch.

Cool.  That's the one that affects the most people.

 Instead we need a way of reproducing the bug and see if we can find a
 solution without a buffer.

You can minimize the buffer usage by applying my move-files patch.  It
constantly reads the redo pipe during the generator's main loop and
marks the redo items with a flag in the existing files struct (and
also forwards the delete indicators on to the sender).  This ensures
that this buffer doesn't expand much at all.  (With both patches applied
I haven't seen it reallocate except when I tested the buffer code with a
16-byte realloc size.)

Alternately, it might not be too hard to remove the buffer and have the
low-level code take a more direct role in interpreting the data, but I'd
have to look at this more closely to see for sure.

One way to reproduce this hang is to modify the receiver code to redo
every file that is processed in the first phase.  Also, my move-files
patch puts enough extra data down the sender-to-generator pipe that it
should hang up without difficulty if you disable the buffer and use the
--move-files option.

..wayne..





Re: Anti-hang comments?

2001-06-26 Thread Wayne Davison

On Mon, 25 Jun 2001, Andrew Tridgell wrote:
 see if we can find a solution without a buffer.

Here's a solution with a non-growing buffer.  This code keeps the
receiver-generator pipe clear by reading the ints and setting redo
flags in a character array (of flist-count elements).  I'm avoiding
setting flags in the actual flist structure since it is shared memory
between 2 forked processes, and this might cause a lot of memory to
become unshared (if the OS supports copy on write for fork).  Since
read_int() is a fairly high-level call, I had to manually ensure that a
flush doesn't happen and to ensure that reading the redo_fd doesn't try
to read the io_error_fd (both to avoid nested read attempts on the
redo_fd).  I have done some simple testing of this with my usual redo
all files testing tweak and it is working fine, but the code is still
pretty young.

If you want to test this, be sure to unapply my previous no-hang patch
or start fresh from the CVS version.  The new patch is here:

http://www.clari.net/~wayne/rsync-nohang2.patch

I think it will also work to start from 2.4.6, but you should also apply
the other no-hang fix I made (that was recently committed to CVS):

http://www.clari.net/~wayne/rsync-nohang1.patch

You'll need to use patch -p1 to apply the new patches (unlike the
previous one, which used -p0) since I had a request for the top-level
directory to be included in the file names.

[FYI, I have not yet ported my move-files patch to use this code.]

..wayne..





Re: Anti-hang comments?

2001-06-25 Thread Andrew Tridgell

Wayne,

I've applied your simple nohang patch. The longer nohang patch I'm not
nearly as confident of. It goes back to a method used in early
versions of rsync where it uses a buffer that can grow indefinately. 

Just some history on this. The earliest versions of rsync had no
buffer, then when I first saw hangs I added a growing buffer very
similar to what your patch adds. Several people found that it grew to
enormous sizes and brought the machine to its knees. I then added an
arbitrary limit on its size (about 4M I think) and then some people
found they got hangs when that filled up. Then I got rid of the
buffer, and we did the pipe/socketpair thing which reduced the hangs a
lot. Next it was discovered that the error pipe could still cause
hangs, and I have fixed that in the current CVS tree, but without a
infinitely growing buffer. Now perhaps you have discovered another
(much less common) way for it to hang, but I don't think the solution
is a buffer. Instead we need a way of reproducing the bug and see if
we can find a solution without a buffer.

The horrors of a badly designed prototcol :(





Re: Anti-hang comments?

2001-06-24 Thread Martin Pool

On 22 Jun 2001, [EMAIL PROTECTED] wrote:

 I have been testing this patch in a duplicate of our production
 environment, for a week now.  With the patch, the runs complete.
 I'm handling 86756263K in 1816688 files (at last count) average 47K
 files (ranging up to about .5G). 

 It seems to solve the problems.  I think it constitutes rsync 2.4.6
 (if you add in the other fixes - errors on module listing, etc.).

Do you mean 2.4.7?

I'm looking at the patch now.  I think I will try it out here locally,
and then make a tarball of 2.4.7pre1 for people to try out more
broadly.

-- 
Martin 
VA Linux SystemsGnuPG encrypted email preferred

 PGP signature