Re: problems encountered in 2.4.6

2001-05-31 Thread Remi Laporte

I've had rsync hangs when transferring hug filesystems (~80Gb) over network,
but till i've suppress the -v option from my command line there's no hang
anymore hang.
The -v option under 2.4.6 is bugged, try to mutiplie v's and the hangs will
increase too.

( rsync -axWP --rsync-path=/usr/local/bin/rsync --stat --delete   source
target)

David Bolen wrote:

 [EMAIL PROTECTED] [[EMAIL PROTECTED]] writes:

  Actually, the lack of -W isn't helping me at all.  The reason is that
  even for the stuff I do over the network, 99% of it is compressed with
  gzip or bzip2.  If the files change, the originals were changed and a
  new compression is made, and usually most of the file is different.

 Just to clarify, when you say over the network you mean in true
 client/server rsync (or across an rsh/ssh stream) and not just using
 one rsync with references using network mount points, right?  In the
 latter case, not having -W is hurting you, never helping.

 But yes, any format (e.g., encryption, compression) that effectively
 distributes changes randomly over a file is going to be a killer for
 rsync.

 For the case of gzip'd files when a client and server rsync are in
 use, you may want to look back through the archives of this list -
 there was a reference to a patch for the gzip sources that created
 rsync-friendly gzip's.  Not as great as the non-gzip'd version, but
 far better than normal gzip.

 Ah yes - here was the URL:

 http://antarctica.penguincomputing.com/~netfilter/diary/gzip.rsync.patch2

 At the time when I tried it (1/2001), here were some test results:

 For comparison, here's a database file (delta between one day and the
 next), both uncompressed and gzip'd (normal and -9).  For the
 uncompressed I also transferred with a fixed 1K blocksize since I know
 that's the page size for the database - the others are default
 computations (I tried the 1K with the gzip'd version but it was
 worse, as expected).

 Normal Normal+1Kgzip   gzip-9
 Size54206464   54206464 21867539   21845091
 Wrote29021821011490  31698643214740
 Read   60176 31764860350  60290
 Total29623581329138  32302143275030

 Speedup18.30  40.786.77   6.67
 Compression 1.00   1.002.479  2.481
 Normalized 18.30  40.78   16.78  16.54

 And in terms of size:

 As Rusty's page comments, they are slightly larger, but not
 tremendously so.  In my one case:

 Normal gzip:21627629
 gzip --rsyncable:   21867539
 gzip -9 --rsyncable:21845091

 So about a 1-1.1% hit in compressed size.

 Personally, here we end up just leaving the major stuff we transfer
 uncompressed - as we're using slow analog lines, the cost recovery was
 easily worth the cost in disk space, particularly in cases like our
 databases where knowledge of the page size and method of change goes a
 long way.

  It definitely helped for transferring ISO images where the whole image
  would be changed if some files changed.  I set the chunk size to 2048
  for that.  Why it defaults to 700 seems odd to me.

 Not sure - perhaps some early empirical work.  When I'm moving files
 that I know something about I definitely control the block size
 myself, so for example, when moving databases with a 1K page size, I
 always use a multiple of that (since I know a priori that's how the
 database dirties the file), and then I scale that up a bit based on
 database size, to get a reasonable tradeoff between block overhead and
 extra transfer upon a change detection.

 -- David

 /---\
  \   David Bolen\   E-mail: [EMAIL PROTECTED]  /
   | FitLinxx, Inc.\  Phone: (203) 708-5192|
  /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150 \
 \---/

--
@  Remi LAPORTE  @
@ TEXAS INSTRUMENTS UNIX SUPPORT @
@[EMAIL PROTECTED]@







Re: problems encountered in 2.4.6

2001-05-30 Thread Dave Dykstra

On Tue, May 29, 2001 at 12:02:41PM -0500, Phil Howard wrote:
 Dave Dykstra wrote:
 
  On Fri, May 25, 2001 at 02:19:59PM -0500, Dave Dykstra wrote:
  ...
   Use the -W option to disable the rsync algorithm.  We really ought to make
   that the default when both the source and destination are local.
  
  I went ahead and submitted a change to the rsync CVS to automatically turn
  on -W when the source and destination are both on the local machine.
 
 So how do I revert that on the command line?
 
 I've been trying with -W doing my disk to disk backups, and I've had
 to go back to not using -W.  Will -c do that?

There's currently no way to revert it.  I thought it wouldn't be necessary,
and I'm not sure how to do it cleanly, because there's currently no precedent
in rsync for a general undoing of options that have different defaults 
depending on the situation.  Another one that comes to mind is --block-io.
The latest rsync in CVS is now using the popt package to process 
options intead of getopt.  Does anybody know if that package has a standard
way to negate options, for example prefixing a no (like --no-block-io) or
something like that?  I took a quick look through the man page and it
wasn't obvious.


 The reason is the load
 on the machine gets so high, nothing else can run.  This is not CPU
 load, but rather, buffering/swapping load.  CPU load just slows other
 things down.  But buffering/swapping load brings other things to a
 grinding halt.  I suspect Linux's tendency to want to keep everything
 that anything writes in RAM, even if that means swapping out all other
 processes, is impacted by this.  So I'll need a way to not have the
 effect of -W to use rsync for disk to disk backups.

Wow.  Rsync is just going too fast for it I guess.  The -W makes it do
a lot of unnecessary disk I/O which must be enough to throttle its
progress.  Sure seems like leaving out -W is the wrong solution.  Maybe
-W has to turn off more of rsync's pipelining since it is no longer
performing the rsync algorithm.


 The fact that rsync loads so much into VM probably makes the problem
 a bit worse in this case.  I saw 1 process at 35M and 2 processes at
 70M (total 175M used by rsync, in addition to all the buffered writes).

Does -W have an impact on that?  I would think that if anything -W would
lessen that effect.


 I'm wondering if rsync is even a good choice for disk to disk backup
 duty.  Is there some option I missed that disables pre-loading all
 the file names into memory?

Maybe it isn't.  There is no such option.

 I also tried the --bwlimit option and it had no effect, not even on
 the usual download syncronizing over a dialup that I do.  I could
 not get it to pace the rate below the dialup speed no matter what
 I would specify.

I haven't used the --bwlimit option and don't really know how it works.
I remember when somebody contributed it that I was skeptical about how
well it could work.  I'm especially not surprised that it has no impact
on local-to-local transfers.

- Dave Dykstra




Re: problems encountered in 2.4.6

2001-05-29 Thread Dave Dykstra

On Fri, May 25, 2001 at 02:19:59PM -0500, Dave Dykstra wrote:
...
 Use the -W option to disable the rsync algorithm.  We really ought to make
 that the default when both the source and destination are local.

I went ahead and submitted a change to the rsync CVS to automatically turn
on -W when the source and destination are both on the local machine.

- Dave Dykstra




rsync 3 (was Re: problems encountered in 2.4.6)

2001-05-28 Thread John N S Gill

 
 There is a feature I would like, and I notice that even with -c this
 does not happen, but I think it could based on the way rsync works.
 What I'd like to have is when a whole file is moved from one directory
 to another, rsync would detect a new file with the same checksum as an
 existing (potentially to be deleted) file, and copy, move, or link, as
 appropriate.  In theory this should apply to anything anywhere in the
 whole file tree being processed.

See the note I posted on May 17th, title is Storing updates and it
includes a tcl script i run on rsync -n output to spot obvious renames
of files and gzip'ing + takes evasive action.  

It would be excellent if rsync could do this sort of thing for me.  The
basic principle is that if you are using --delete then when a file is
missing a good place to look is in the list of deletions.

I spoke to Rusty Russell last November when he was visiting Dublin and
he mentioned there had been some thinking about an rsync 3.  One
feature being considered was allowing users to supply arbitrary rules
for what to do when a file is missing, based on file suffix etc.  Did
anyone follow up these ideas?

John





Re: problems encountered in 2.4.6

2001-05-25 Thread Phil Howard

Dave Dykstra wrote:

  2 =
  When syncronizing a very large number of files, all files in a large
  partition, rsync frequently hangs.  It's about 50% of the time, but
  seems to be a function of how much work there was to be done.  That
  is, if I run it soon after it just ran, it tends to not hang, but if
  I run it after quite some time (and lots of stuff to syncronize) it
  tends to hang.  It appears to have completed all the files, but I
  don't get any stats.  There are 3 rsync processes sitting idle with
  no files open in the source or target trees.
  
  At last count there were 368827 files and 8083 symlinks in 21749
  directories.
  
  df shows:
  /dev/hda4 42188460  38303916   3884544  91% /home
  /dev/hdb4 42188460  38301972   3886488  91% /mnt/hdb/home
  
  df -i shows:
  /dev/hda42662400  398419 2263981   15% /home
  /dev/hdb42662400  398462 2263938   15% /mnt/hdb/home
  
  The df numbers are not exact because change is constantly happening
  on this active server.  Drives hda and hdb are identical and are
  partitioned alike.
  
  The command line is echoed from the script that runs it:
  
  rsync -axv --stats --delete /home/. /mnt/hdb/home/.  
1'/home/root/backup-hda-to-hdb/home.log' 21
 
 
 Use the -W option to disable the rsync algorithm.  We really ought to make
 that the default when both the source and destination are local.

I don't want to copy everything every time.  That's why I am using
rsync to do this in the first place.  I don't understand why this
would be what's hanging.

  A deadly embrace?  It seems possible.
 
 
 No, the receiving side of an rsync transaction splits itself into two
 processes for the sake of pipelining: one to generate checksums and one to
 accept updates.  When you're sending and receiving to the same machine then
 you've got one sender and 2 receivers.

Right.  But what I was suggesting was a deadly embrace in that the
process killed was waiting for something, and the parent was waiting
for something.

I'm not using the c option, so why would checksum be generated?

  I'm also curious why 26704 has no fd 1.
 
 I don't know.  When I tried it all 3 processes had an fd 1.

Were you looking at it after it hung?  Or is it not hanging for you?
I am curious if the lack of fd 1 is related to the hang.  It is being
started with 1 and 2 redirected to a log file _and_ the whole thing
is being run via the script command for a big picture logfile.
It was set up this way with the intent to run it from cron, although
I haven't actually added it to crontab, yet, due to the problems.


  3 =
  @ERROR: max connections (16) reached - try again later
  
  This occurs after just one connection is active.  It behaves as if
  I had specified max connections = 1.  On another server I set it
  to 40, and it showed:
  
  @ERROR: max connections (40) reached - try again later
  
  so it obvious is parsing and keeping the value I configure, but it
  isn't using it correctly.
  
  Also, if I ^C the client, then I get this error every time until I
  restart the daemon (running in standalone daemon mode, not inetd).
  So it seems like it counts clients wrong.  But I can't get more
  that 1 right after restarting the server, so it's a little more
  than that somewhere.
 
 I don't know, I never used max connections.  Could indeed be a bug.
 The code looks pretty tricky.  It's trying to lock pieces of the file
 /var/run/rsyncd.lock in order for independent processes to coordinate. 
 Are you running as root (the lsof above suggests you are)?  If not, you
 probably need to specify another file that your daemon has access to in the
 lock file option.  Otherwise it would probably help for you to run some
 straces.

I would have presumed since there was a daemon process running
(as opposed to running from inetd) that the daemon itself could
simply track the connection count.

One possibility here is that I do have /var/run symlinked to /ram/run
which is on a ramdisk.  So the lock file is there.  The file is there
but it is empty.  Should it have data in it?  BTW, it was in ramdisk
in 2.4.4 and this max connections problem did not exist, so if there
is a ramdisk sensitivity, it's new since 2.4.4.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




Re: problems encountered in 2.4.6

2001-05-25 Thread Dave Dykstra

On Fri, May 25, 2001 at 04:33:28PM -0500, Phil Howard wrote:
 Dave Dykstra wrote:
   One possibility here is that I do have /var/run symlinked to /ram/run
   which is on a ramdisk.  So the lock file is there.  The file is there
   but it is empty.  Should it have data in it?  BTW, it was in ramdisk
   in 2.4.4 and this max connections problem did not exist, so if there
   is a ramdisk sensitivity, it's new since 2.4.4.
  
  I don't know if it will show up with data in it or not, I've never tried it.
  You'll probably need to do some straces.
 
 Where is the count of number of current connections supposed to be kept?
 It's obviously not actually being kept in this file, at least not when on
 a ramdisk.  But if it's supposed to be, that's the problem.  OTOH, it is
 easy to get the count out of sync this way, too.  If a process is killed
 or otherwise just dies, the count is higher than real.  When I do multi-
 process servers with controlled process counts, I like to have the parent
 track the number of children running.  Of course that precludes using inetd.

It locks different ranges of bytes of the file rather than keeping a count in
it.  I guess the idea with that is if a process dies the operating system
will automatically remove the lock.

- Dave Dykstra




RE: problems encountered in 2.4.6

2001-05-25 Thread David Bolen

[EMAIL PROTECTED] [[EMAIL PROTECTED]] writes:

Dave Dykstra wrote:

 That's two different kinds of checksums.  The -c option runs a whole-file
 checksum on both sides, but if you don't use -W the rsync rolling
checksum
 will be applied.

So the chunk-by-chunk checksum always is used w/o -W?  I guess the docs are
more confusing than I originally thought.

It might help if you think of it as two phases - discovery of what
files need to be transferred, and then the transfer itself.

The discovery phase will by default just check timestamps and sizes.
You can adjust that with command line options, including the use of -c
to include a full file checksum as part of the comparison, if for
example, files might change without affecting timestamp or size.

Once rsync knows what it needs to transfer, then it works its way
through the file list, and for each file it performs a transfer.  By
default, that transfer is the rsync protocol - which involves the full
process of dividing the file into chunks with both a strong and
rolling checksum, and doing the computations to figure out what parts
to send and so on.

Now, normally this process is divided so that the copy of rsync that
does the I/O is local to the file - e.g., for discovery both client
and server rsync identify file timestamp/sizes independently (and
optionally compute the checksums locally) and then exchange that
information.  For transfer both rsyncs build up the rolling and chunk
checksums and exchange them and then decide what file data to send.

But when you are copying with a single rsync (and in particular when
one of the files is on the network), then that rsync has to do all the
work.  That means that during discovery it either 'stat's all files or
optionally computes checksums.  To do the checksum it has to read the
file, so both source and destination get read fully - if either are
on the network you will have already spent the network traffic to pull
the complete files back to the local machine.

Likewise for the transfer - under the rsync protocol, rsync has to
compute the checksums for both source and destination files.  Now,
it'll only do this for those that it wants to transfer, but in those
cases it effectively pulls back complete files from the network just
to compute the checksums, only to then start transferring them.  Even
if the rsync protocol yields a very small amount of difference,
anything beyond that point is already more than the full file with
respect to the network activity that takes place.

That's why the -W option is really the only logical thing to use with
a single rsync and local (on-system or network share/mount) copies.
Under such circumstances, the rsync protocol isn't going to help at
all, and will probably slow things down and take more memory instead.
With -W rsync becomes an intelligent copier (in terms of figuring out
what changed), but that's about it.

-- David

/---\
 \   David Bolen\   E-mail: [EMAIL PROTECTED]  /
  | FitLinxx, Inc.\  Phone: (203) 708-5192|
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150 \
\---/




Re: problems encountered in 2.4.6

2001-05-25 Thread Phil Howard

David Bolen wrote:

 The discovery phase will by default just check timestamps and sizes.
 You can adjust that with command line options, including the use of -c
 to include a full file checksum as part of the comparison, if for
 example, files might change without affecting timestamp or size.
 
 Once rsync knows what it needs to transfer, then it works its way
 through the file list, and for each file it performs a transfer.  By
 default, that transfer is the rsync protocol - which involves the full
 process of dividing the file into chunks with both a strong and
 rolling checksum, and doing the computations to figure out what parts
 to send and so on.

This is where the docs were a bit confusing.  There was no clear
distinction of checksum types related to the -c option.  This implied
to me that w/o -c there would be no checksum at all, and what I thought
the behaviour would be was what I now understand it to be with -W.


 That's why the -W option is really the only logical thing to use with
 a single rsync and local (on-system or network share/mount) copies.
 Under such circumstances, the rsync protocol isn't going to help at
 all, and will probably slow things down and take more memory instead.
 With -W rsync becomes an intelligent copier (in terms of figuring out
 what changed), but that's about it.

Actually, the lack of -W isn't helping me at all.  The reason is that
even for the stuff I do over the network, 99% of it is compressed with
gzip or bzip2.  If the files change, the originals were changed and a
new compression is made, and usually most of the file is different.

It definitely helped for transferring ISO images where the whole image
would be changed if some files changed.  I set the chunk size to 2048
for that.  Why it defaults to 700 seems odd to me.

There is a feature I would like, and I notice that even with -c this
does not happen, but I think it could based on the way rsync works.
What I'd like to have is when a whole file is moved from one directory
to another, rsync would detect a new file with the same checksum as an
existing (potentially to be deleted) file, and copy, move, or link, as
appropriate.  In theory this should apply to anything anywhere in the
whole file tree being processed.

-- 
-
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ |
-




RE: problems encountered in 2.4.6

2001-05-25 Thread David Bolen

[EMAIL PROTECTED] [[EMAIL PROTECTED]] writes:

 Actually, the lack of -W isn't helping me at all.  The reason is that
 even for the stuff I do over the network, 99% of it is compressed with
 gzip or bzip2.  If the files change, the originals were changed and a
 new compression is made, and usually most of the file is different.

Just to clarify, when you say over the network you mean in true
client/server rsync (or across an rsh/ssh stream) and not just using
one rsync with references using network mount points, right?  In the
latter case, not having -W is hurting you, never helping.

But yes, any format (e.g., encryption, compression) that effectively
distributes changes randomly over a file is going to be a killer for
rsync.

For the case of gzip'd files when a client and server rsync are in
use, you may want to look back through the archives of this list -
there was a reference to a patch for the gzip sources that created
rsync-friendly gzip's.  Not as great as the non-gzip'd version, but
far better than normal gzip.

Ah yes - here was the URL:

http://antarctica.penguincomputing.com/~netfilter/diary/gzip.rsync.patch2

At the time when I tried it (1/2001), here were some test results:

For comparison, here's a database file (delta between one day and the
next), both uncompressed and gzip'd (normal and -9).  For the
uncompressed I also transferred with a fixed 1K blocksize since I know
that's the page size for the database - the others are default
computations (I tried the 1K with the gzip'd version but it was
worse, as expected).

Normal Normal+1Kgzip   gzip-9   
Size54206464   54206464 21867539   21845091
Wrote29021821011490  31698643214740
Read   60176 31764860350  60290
Total29623581329138  32302143275030

Speedup18.30  40.786.77   6.67
Compression 1.00   1.002.479  2.481
Normalized 18.30  40.78   16.78  16.54

And in terms of size:
   
As Rusty's page comments, they are slightly larger, but not
tremendously so.  In my one case:

Normal gzip:21627629
gzip --rsyncable:   21867539
gzip -9 --rsyncable:21845091

So about a 1-1.1% hit in compressed size.


Personally, here we end up just leaving the major stuff we transfer
uncompressed - as we're using slow analog lines, the cost recovery was
easily worth the cost in disk space, particularly in cases like our
databases where knowledge of the page size and method of change goes a
long way.

 It definitely helped for transferring ISO images where the whole image
 would be changed if some files changed.  I set the chunk size to 2048
 for that.  Why it defaults to 700 seems odd to me.

Not sure - perhaps some early empirical work.  When I'm moving files
that I know something about I definitely control the block size
myself, so for example, when moving databases with a 1K page size, I
always use a multiple of that (since I know a priori that's how the
database dirties the file), and then I scale that up a bit based on
database size, to get a reasonable tradeoff between block overhead and
extra transfer upon a change detection.

-- David

/---\
 \   David Bolen\   E-mail: [EMAIL PROTECTED]  /
  | FitLinxx, Inc.\  Phone: (203) 708-5192|
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150 \
\---/