RE: Rsync: Re: patch to enable faster mirroring of large filesyst ems
I was at first, but then removed it. The results were still insufficiently fast. Were you using the -c option of rsync? It sounds like you were and it's extremely slow. I knew somebody who once went to extraordinary lengths to avoid the overhead of -c, making a big patch to rsync to cache checksums, when all he had to do was not use -c.
Re: Rsync: Re: patch to enable faster mirroring of large filesyst ems
On Thu, Nov 29, 2001 at 12:59:00PM -0600, Keating, Tim wrote: I was at first, but then removed it. The results were still insufficiently fast. Were you using the -c option of rsync? It sounds like you were and it's extremely slow. I knew somebody who once went to extraordinary lengths to avoid the overhead of -c, making a big patch to rsync to cache checksums, when all he had to do was not use -c. 23 minutes to check 3200 files is definitely unexpected. What options did you end up using? Normally rsync will only check the modification timestamps and the sizes of the files on both sides (that is, only a stat()) and if they match it will not do anything else. - Dave Dykstra
Re: Rsync: Re: patch to enable faster mirroring of large filesyst ems
It seems to me the new options --read-batch and --write-batch should go a long way towards reducing any time spent in creation of checksums and file lists, so you should definitely give 2.4.7pre4 a try. This is just a guess since I haven't actually used those options myself, but seems worth looking into. BTW, could we please have some real documentation about these options? What's in the man page doesn't come nearly close to telling what is cached and how to make use of it. Some examples of how people are using this option may be illuminating for those of us who don't have the time or inclination to figure it out from the code. -- Alberto In message [EMAIL PROTECTED], Keating, Tim writes: I was at first, but then removed it. The results were still insufficiently fast. Were you using the -c option of rsync? It sounds like you were and it's extremely slow. I knew somebody who once went to extraordinary lengths to avoid the overhead of -c, making a big patch to rsync to cache checksums, when all he had to do was not use -c. Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
RE: Rsync: Re: patch to enable faster mirroring of large filesyst ems
Keating, Tim [[EMAIL PROTECTED]] writes: - If there's a mismatch, the client sends over the entire .checksum file. The server does the compare and sends back a list of files to delete and a list of files to update. (And now I think of it, it would probably be better if the server just sent the client back the list of files and let the client figure out what it needed, since this would distribute the work better.) Whenever caching checksums comes up I'm always curious - how do you figure out if your checksum cache is still valid (e.g., properly associated with its file) without re-checksumming the files? Are you just trusting size/timestamp? I know in my case I've got database files that don't change timestamp/size and yet have different contents. Thus I'd always have to do full checksums so I'm not sure what a cache would buy. -- David /---\ \ David Bolen\ E-mail: [EMAIL PROTECTED] / | FitLinxx, Inc.\ Phone: (203) 708-5192| / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \---/
RE: Rsync: Re: patch to enable faster mirroring of large filesyst ems
In my particular case, it is reasonable to assume that the size and timestamp will change when the file is updated. (We are looking at it as a patching mechanism.) Right now it's actually using update time only, I should modify it to check the file size as well. Is there a way you could query your database to tell you which extents have data that has been modified within a certain timeframe? -Original Message- From: David Bolen [mailto:[EMAIL PROTECTED]] Sent: Thursday, November 29, 2001 2:12 PM To: 'Keating, Tim' Cc: [EMAIL PROTECTED] Subject: RE: Rsync: Re: patch to enable faster mirroring of large filesyst ems Keating, Tim [[EMAIL PROTECTED]] writes: - If there's a mismatch, the client sends over the entire .checksum file. The server does the compare and sends back a list of files to delete and a list of files to update. (And now I think of it, it would probably be better if the server just sent the client back the list of files and let the client figure out what it needed, since this would distribute the work better.) Whenever caching checksums comes up I'm always curious - how do you figure out if your checksum cache is still valid (e.g., properly associated with its file) without re-checksumming the files? Are you just trusting size/timestamp? I know in my case I've got database files that don't change timestamp/size and yet have different contents. Thus I'd always have to do full checksums so I'm not sure what a cache would buy. -- David /- --\ \ David Bolen\ E-mail: [EMAIL PROTECTED] / | FitLinxx, Inc.\ Phone: (203) 708-5192| / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \- --/
RE: Rsync: Re: patch to enable faster mirroring of large filesyst ems
Keating, Tim [[EMAIL PROTECTED]] writes: Is there a way you could query your database to tell you which extents have data that has been modified within a certain timeframe? Not in any practical way that I know of. It's not normally a major hassle for us since rsync is used for a central backup that occurs on a large enough time scale that the timestamp does normally change from the prior time. So our controlling script just does its own timestamp comparison and only activates the -c rsync option (which definitely increases overhead) if they happen to match. Although I will say that the whole behavior (the transaction log always has an appropriate timestamp, it's just the raw database file itself that doesn't) sure caught me by surprise in the beginning after finding what I thought was a valid backup wouldn't load :-) -- David /---\ \ David Bolen\ E-mail: [EMAIL PROTECTED] / | FitLinxx, Inc.\ Phone: (203) 708-5192| / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \---/