Alle 21:37, venerd� 5 marzo 2004, Chris Ruprecht ha scritto:
> I am wondering how you guys back up your databases. Say, I have a 20 GB
> database, data and indexes. If I run pg_dump on this, it backs up the
> schema and the data. When I have to restore this, I whould have to run this
> through psql which would then re-build the indexes after it has inserted
> the records. That's not really a feasable option as it takes too long.

When you have large databases and you need consistent backups, an interesting 
solution is master-slave replication. Just use a second server of your LAN as 
a slave replica of your main database server. Should you have any problem 
with the main server, just switch the IP addresses of the master and slave 
servers and you are up and running again. The switching takes a few 
milliseconds and can even be performed automatically by a few specific 
programs (look for "fault-tolerant linux systems" with google). While your 
backup server serves your users, you have all the time to install a new hard 
disk into the main server, restore your data (from the running slave server 
or from a tape) and come back to service.

When you need to put your data into a tape or a CD-ROM, just stop the 
master-slave mechanism, make a "static" backup (a pg_dump or a file system 
copy) of your _slave_ server and restart the replication program. The 
master-slave replication mechanism will take care to re-align the two servers 
in a few minutes. This way, you never need to stop the main server.

> If I back up the pgdata directory, I will get inconsistent data, as
> somebody might update data in 2 tables while my backup is running, one
> change in a table I have already backed up, one in a table I have not. The
> solution would be to shut the database server down, do the backup and then
> start everything back up. But that's not really an option in a 24/7
> environment.

Making a copy of your data directory while your server is running is 
definitely a bad idea. Forget it. Backing up the file system used by a RDBMS 
usually requires to put the server off-line and flush all the unwritten data 
to the disk. This cannot be done in a 24/7 environment.

> What I'd like to see is a transaction aware backup which backs up indexes
> as well as data.

Have a look at one of the master-slave replication systems available on the 
net. For this task you do not need anything as sophisticated as a 
multi-master replication system (Like the Bettina Kemme "PG-Replicator" 
project). You could even be able to write down your own client program with 
Perl o Python. The program just have to connect to your running master 
server, read the relevant data and insert them into the database on the  
slave server. Using a SERIAL primary key it is quite easy to spot which data 
of a table have not been copied to the slave server yet. Using a timer, you 
can define the time granularity of the replication mechanism that best fit 
your need. Using transactions you can be sure to get just consistent data. 
Reading and saving system tables, you can backup even the most hidden and 
subtle things of your DB.

Beware that restoring your data is up to you, if you use a custom-made system. 
Test the restore system before using your program in a production 
environment.

Good luck.
-----------------------------------------
Alessandro Bottoni and Silvana Di Martino
[EMAIL PROTECTED]
[EMAIL PROTECTED]

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to