jw schultz wrote:
On Sun, Jun 22, 2003 at 11:42:46AM +0200, Ron Arts wrote:

Dear all,

I am implementing a backup system, where thousands of postgreSQL
databases (max 1 Gb in size) on as much clients need to be backed
up nightly across ISDN lines.

Because of the limited bandwidth, rsync is the prime candidate of
course.


Only if you are updating an existing file on the backup
server with sufficient commonality from one version to the
next. pg_dump --format=t would is good. Avoid the built-in
compression in pg_dump as it defeats rsync.

Restore time is significant, so I think I need a straight mirror of the database files on the client. I think importing a multi gigabyte SQL dump will take too long for us (one hour is the limit). Have not tried that yet on postgreSQL though.

> gzip with the
rsyncable patch and bzip2 are OK if you must compress.


So unpatched bzip2 is ok? nice to know.. Maybe I can tar an LVM snapshot, and bzip2 that before rsyncing. Thanks for that one.

The other issue is individual file size. Rsync versions
prior to what is in CVS start having some performance issues
with files larger than the 200-500MB range.


I'll keep that in mind.


Potential problems I see are server load (I/O and CPU), and filesystem limits.


Most of the load is on the sender.  Over ISDN even with
rsync compressing the datastream no one update should be CPU
or I/O issue.  The issue is scheduling so you don't have too
many running simultaneously.


As I understand the algorithm, the server creates a list of checksums (which is around 1% size of the original file), which is not really CPU intensive, sends that to the client, and then the client does a lot of work finding blocks that are the same as the server file.

So the server at least reads every file completely that is in the
rsync tree am i correct? In my case that means a lots of disk I/O,
given the total size for all databases (multiple TB's).

Please correct me if I'm wrong.

The easiest way to manage the scheduling is to have the
server pull.  If that isn't possible then you will need to
use an rsync wrapper that keeps the simultaneous runs within
limits or put a good deal of smarts into the clients.


Yeah, pulling is out of the question, because the server can't activate the ISDN link. The clients' rsync start time will need to be hashed across the night.


Does anyone have experience with such setups?


Unlikely on that scale over that sort of link.

I'd suggest experimenting with -v and the --stats options turned on.


I will, thanks.

Ron

--
Netland Internet Services
bedrijfsmatige internetoplossingen

http://www.netland.nl   Kruislaan 419              1098 VA Amsterdam
info: 020-5628282       servicedesk: 020-5628280   fax: 020-5628281

Useless Invention: Leather cutlery.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to