Re: rsync 1tb+ each day

Kenny Gorman Wed, 05 Feb 2003 10:51:37 -0800

Eric Whiting wrote:

I've learned some good things from this discussion. THanks.

Kenny, I have one concern/idea -- The original post says the 'disk is
fairly slow'. That is one bottleneck that should probably be examined a
little more. How fast are your disks? HOw fast is your network? An IDE
disk with DMA disabled might run 5M/s and when you enable DMA you can
see up to 45M/s. Perhaps this is a root cause problem that has already
been looked at, but perhaps it would be good to look at it again. Also
do you have enough RAM on the destination to do some caching of the file
for the multiple reads of the file? That might also help.
eric

The disks on the destination side are a stripe on a sun 5200. not sure of the underlaying disk size, etc at this point. On top of that we have the VxFS file system with caching enabled. The box has 12gb ram. It does indeed cache the second read of the file, so thats good.

We are going to speed up the disks, and try this again very soon (it's going to take a while for us to get the 2tb file system rebuilt ;-)

thx for the tips!

-kg



jw schultz wrote:

On Tue, Feb 04, 2003 at 11:29:48AM -0800, Kenny Gorman wrote:

I am rsyncing 1tb of data each day.  I am finding in my testing that
actually removing the target files each day then rsyncing is faster than
doing a compare of the source->target files then rsyncing over the delta
blocks.  This is because we have a fast link between the two boxes, and
that are disk is fairly slow. I am finding that the creation of the temp
file (the 'dot file') is actually the slowest part of the operation.
This has to be done for each file because the timestamp and at least a
couple blocks are guaranteed to have changed (oracle files).

As others have mentioned -W (--whole-file) will help here.

The reason the temp-file is so slow is that it is reading
blocks from the disk and writing them to other blocks on the
same disk.  This means every block that is unchanged must be
transfered twice over the interface where changed blocks are
only transfered once.  If the files are very large this is
guaranteed to cause a seek storm.

Further, all of this happens after the entire file has been
read once to generate the block checksums.  Unless your
tree is smallish reads from the checksum pass will have been
flushed from cache by the time you do the final transfer.
--whole-file elminiates most of the disk activity.  You no
longer do the block checksum pass and replace the local copying
(read+write) with a simple write from the network.

Most likely your network is faster than the disks.  For
files that change but change very little your disk subsystem
would have to be more than triple the speed of your network
for the rsync algorythm (as oposed to the utility) to be of
benefit.  If the files change a lot then you merely need
double the speed.

--
________________________________________________________________
       J.W. Schultz            Pegasystems Technologies
       email address:          [EMAIL PROTECTED]

               Remember Cernan and Schmitt
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html


--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html

Re: rsync 1tb+ each day

Reply via email to