Re: Nice little performance improvement

2009-10-17 Thread Mike Connell


Hi,


Interesting.  If you're not using incremental recursion (the default in
rsync = 3.0.0), I can see that the du would help by forcing the
destination I/O to overlap the file-list building in time.  But with
incremental recursion, the du shouldn't be necessary because rsync
actually overlaps the checking of destination files with the file-list
building on the source.


Ignoring incremental recursion for a moment. It seems to me that anything
that can warm up the file cache before it is needed would be beneficial?
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Nice little performance improvement

2009-10-17 Thread Mike Connell

No, not if the file cache isn't large enough for the number of files.
E.g. if you have 20 million files and only 256MB RAM, it's likely a bad 
idea.



Splitting down to the subsub (2-levels down) directory level allows a single
subsub rsync to fit for me. Warming the cache is beneficial here, I didn't 
say
it was in every situation. 


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Nice little performance improvement

2009-10-15 Thread Mike Connell
Hi,

In my situation I'm using rsync to backup a server with (currently) about 
570,000 files.
These are all little files and maybe .1% of them change or new ones are added in
any 15 minute period.

I've split the main tree up so rsync can run on sub sub directories of the main 
tree. 
It does each of these sub sub directories sequentially. I would have liked to 
run 
some of these in parallel, but that seems to increase i/o on the main server 
too much.


Today I tried the following:

For all subsub directories
a) Fork a du -s subsubdirectory on the destination subsubdirectory
b) Run rsync on the subsubdirectory
c) repeat untill done

Seems to have improved the time it takes by about 25-30%. It looks like the du 
can
run ahead of the rsync...so that while rsync is building its file list, the du 
is warming up
the file cache on the destination. Then when rsync looks to see what it needs 
to do
on the destination, it can do this more efficiently.

Looks like a keeper so far. Any other suggestions? (was thinking of a previous
suggestion of setting /proc/sys/vm/vfs_cache_pressure to a low value).

Thanks,

Mike-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Nice little performance improvement

2009-10-15 Thread Mike Connell

Hi,


In order to expeditiously move these new files offsite, we use a modified
version of pyinotify to log all added/altered files across the entire
filesystem(s) and then every five minutes feed the list to rsync with the
--files-from option. This works very effectively and quickly.


Interesting...

How do you tell rsync to delete files that were deleted from the source, 
or is that not part of your use case?


Thanks,

Mike
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


can a .rsync-filter improve performance?

2009-10-06 Thread Mike Connell
Hi,

I know certain subtrees I want to backup are written once
and never deleted.

So to reduce the time it takes rsync to run, I was thinking
of putting the following .rsync-filter in each of these subtrees:

P /**

I can see this stops the files on the receiver side from being 
deleted.

Does this filter also improve performance?

It looks like it does. If I do a manual test and delete
a file in this subtree on the sender side, when I look at the output
from the building file list, it does  not show the deleted file.

So will this help improve performance (time it takes to build file list and
time it takes to update receiver)?

Is there a better way to prune, when you have subtrees that are 
only written once and you don't need rsync to keep visiting these
subtrees?

Thanks,

Mike
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: can a .rsync-filter improve performance?

2009-10-06 Thread Mike Connell
Trying rsync with --verbose and --progress flags looks like the .rsync-filter
only complicates the matter. The filter first causes rsync to protect the 
files/directories
that match, and then rsync later checks them to see whether they are uptodate.

Is there a way to stop rsync from visiting a subtree entirely? I was thinking I 
could
add this (whatever it is) dynamically after the subtree had been written.

Thanks,

Mike
  - Original Message - 
  From: Mike Connell 
  To: rsync@lists.samba.org 
  Sent: Monday, October 05, 2009 11:08 PM
  Subject: can a .rsync-filter improve performance?


  Hi,

  I know certain subtrees I want to backup are written once
  and never deleted.

  So to reduce the time it takes rsync to run, I was thinking
  of putting the following .rsync-filter in each of these subtrees:

  P /**

  I can see this stops the files on the receiver side from being 
  deleted.

  Does this filter also improve performance?

  It looks like it does. If I do a manual test and delete
  a file in this subtree on the sender side, when I look at the output
  from the building file list, it does  not show the deleted file.

  So will this help improve performance (time it takes to build file list and
  time it takes to update receiver)?

  Is there a better way to prune, when you have subtrees that are 
  only written once and you don't need rsync to keep visiting these
  subtrees?

  Thanks,

  Mike
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: sync performance falls off a cliff

2009-07-15 Thread Mike Connell

Hi,

Here is an update. I haven't deployed a new version of rsync into 
production.
Instead I split my current rsync up into 10 independent sub directories of 
the

main directory. I run them serially one after the other.

I'm up to 404,000 files and the total sync time doesn't seem to be falling 
off

a cliff (yet).

In my case, only about .1% of my files change, so I'm sure it isn't a rsync 
memory
issue. But I strongly suspect with the results I'm getting so far, that it 
is a

matter of how many directories and inodes can be kept cached in memory. The
largest of the 10 sub directory rsyncs is about 75,000 files. So this would 
seem

to put less pressure on this cache.

Thanks,

Mike 


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


sync performance falls off a cliff

2009-06-29 Thread Mike Connell
Hi,

I've got identical servers. One is primary the other is backup
receiving rsyncs from the primary. I'm backing up a file system to
disk and the files are small and there are lots of directories.

The overall problem seems to be the total number of files.
When I had ~375,000 files, the total rsync time was under a minute.
With ~425,000 files, the total rsync time is 10 minutes.

Last Friday when we were at 425,000 files, the rsync time was 10 minutes.
Today I was able to delete 50,000 unneeded files and the rsync time went
back down to under a minute.

So why the huge change in total rsync time for a somewhat small change
in total number of files? I'm afraid that as the total number of files keeps
increasing that the total rsync time is going to go exponential.

I turn the --progress flag on, and the time is rougly divided up evenly between
building the file list and looking thru the file list. The files themselves
are really small (~16K) and I'm not seeing any problem with anything
other than how long it takes rsync to make a pass thru all the files. I do use
the --delete option.

The servers are Dell 2950s, builtin RAID 10 disks and 4Gig of RAM.
OS is Centos 5.1. I'm running rsync 2.6.8 protocol version 29.

This smells to me like some sort of caching problem. Is there something
in the kernel or rsync itself that I can tweek?

Thanks,

Mike-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html