RE: Rsync: Re: patch to enable faster mirroring of large filesyst ems

2001-11-29 Thread Keating, Tim

I was at first, but then removed it. The results were still insufficiently
fast.

 Were you using the -c option of rsync?  It sounds like you 
 were and it's
 extremely slow.  I knew somebody who once went to 
 extraordinary lengths to
 avoid the overhead of -c, making a big patch to rsync to 
 cache checksums,
 when all he had to do was not use -c.




Re: Rsync: Re: patch to enable faster mirroring of large filesyst ems

2001-11-29 Thread Dave Dykstra

On Thu, Nov 29, 2001 at 12:59:00PM -0600, Keating, Tim wrote:
 I was at first, but then removed it. The results were still insufficiently
 fast.
 
  Were you using the -c option of rsync?  It sounds like you 
  were and it's
  extremely slow.  I knew somebody who once went to 
  extraordinary lengths to
  avoid the overhead of -c, making a big patch to rsync to 
  cache checksums,
  when all he had to do was not use -c.

23 minutes to check 3200 files is definitely unexpected.  What options did
you end up using?  Normally rsync will only check the modification
timestamps and the sizes of the files on both sides (that is, only a
stat()) and if they match it will not do anything else.

- Dave Dykstra




Re: Rsync: Re: patch to enable faster mirroring of large filesyst ems

2001-11-29 Thread Alberto Accomazzi


It seems to me the new options --read-batch and --write-batch should go 
a long way towards reducing any time spent in creation of checksums and
file lists, so you should definitely give 2.4.7pre4 a try.  This is just
a guess since I haven't actually used those options myself, but seems
worth looking into.

BTW, could we please have some real documentation about these options?  What's
in the man page doesn't come nearly close to telling what is cached and
how to make use of it.  Some examples of how people are using this option
may be illuminating for those of us who don't have the time or inclination 
to figure it out from the code.


-- Alberto


In message [EMAIL PROTECTED],
 Keating, Tim writes:

 I was at first, but then removed it. The results were still insufficiently
 fast.
 
  Were you using the -c option of rsync?  It sounds like you 
  were and it's
  extremely slow.  I knew somebody who once went to 
  extraordinary lengths to
  avoid the overhead of -c, making a big patch to rsync to 
  cache checksums,
  when all he had to do was not use -c.
 




Alberto Accomazzi  mailto:[EMAIL PROTECTED]
NASA Astrophysics Data System  http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   





RE: Rsync: Re: patch to enable faster mirroring of large filesyst ems

2001-11-29 Thread David Bolen

Keating, Tim [[EMAIL PROTECTED]] writes:

  - If there's a mismatch, the client sends over the entire .checksum
 file.  The server does the compare and sends back a list of files to
 delete and a list of files to update. (And now I think of it, it
 would probably be better if the server just sent the client back the
 list of files and let the client figure out what it needed, since
 this would distribute the work better.)

Whenever caching checksums comes up I'm always curious - how do you
figure out if your checksum cache is still valid (e.g., properly
associated with its file) without re-checksumming the files?

Are you just trusting size/timestamp?  I know in my case I've got
database files that don't change timestamp/size and yet have different
contents.  Thus I'd always have to do full checksums so I'm not sure
what a cache would buy.

-- David

/---\
 \   David Bolen\   E-mail: [EMAIL PROTECTED]  /
  | FitLinxx, Inc.\  Phone: (203) 708-5192|
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150 \
\---/




RE: Rsync: Re: patch to enable faster mirroring of large filesyst ems

2001-11-29 Thread Keating, Tim

In my particular case, it is reasonable to assume that the size and
timestamp will change when the file is updated. (We are looking at it as a
patching mechanism.)

Right now it's actually using update time only, I should modify it to check
the file size as well.

Is there a way you could query your database to tell you which extents have
data that has been modified within a certain timeframe?

 -Original Message-
 From: David Bolen [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, November 29, 2001 2:12 PM
 To: 'Keating, Tim'
 Cc: [EMAIL PROTECTED]
 Subject: RE: Rsync: Re: patch to enable faster mirroring of large
 filesyst ems 
 
 
 Keating, Tim [[EMAIL PROTECTED]] writes:
 
   - If there's a mismatch, the client sends over the entire .checksum
  file.  The server does the compare and sends back a list of files to
  delete and a list of files to update. (And now I think of it, it
  would probably be better if the server just sent the client back the
  list of files and let the client figure out what it needed, since
  this would distribute the work better.)
 
 Whenever caching checksums comes up I'm always curious - how do you
 figure out if your checksum cache is still valid (e.g., properly
 associated with its file) without re-checksumming the files?
 
 Are you just trusting size/timestamp?  I know in my case I've got
 database files that don't change timestamp/size and yet have different
 contents.  Thus I'd always have to do full checksums so I'm not sure
 what a cache would buy.
 
 -- David
 
 /-
 --\
  \   David Bolen\   E-mail: 
 [EMAIL PROTECTED]  /
   | FitLinxx, Inc.\  Phone: (203) 
 708-5192|
  /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 
 316-5150 \
 \-
 --/
 




RE: Rsync: Re: patch to enable faster mirroring of large filesyst ems

2001-11-29 Thread David Bolen

Keating, Tim [[EMAIL PROTECTED]] writes:

 Is there a way you could query your database to tell you which
 extents have data that has been modified within a certain timeframe?

Not in any practical way that I know of.  It's not normally a major
hassle for us since rsync is used for a central backup that occurs on
a large enough time scale that the timestamp does normally change from
the prior time.  So our controlling script just does its own timestamp
comparison and only activates the -c rsync option (which definitely
increases overhead) if they happen to match.

Although I will say that the whole behavior (the transaction log
always has an appropriate timestamp, it's just the raw database file
itself that doesn't) sure caught me by surprise in the beginning after
finding what I thought was a valid backup wouldn't load :-)

-- David

/---\
 \   David Bolen\   E-mail: [EMAIL PROTECTED]  /
  | FitLinxx, Inc.\  Phone: (203) 708-5192|
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150 \
\---/