Hi Scott,
Why you do not replicate this master to the other location/s using other
methods like bucardo?, you can pick the tables you really want get
replicated there.
For the backup turn to hot backup (tar $PGDATA)+ archiving, easier,
faster and more efficient rather than a logical copy with pgdump.
A.A
On 04/25/2012 09:11 AM, Scott Whitney wrote:
Hello, everyone. I want to throw a scenario out there to see what
y'all think.
Soon, my cluster backups will be increasing in size inordinately.
They're going to immediately go to 3x as large as they currently are
with the potential to be about 20x within a year or so.
My current setup uses a single PG 8.x server doing nightly dumps (not
ideal but sufficient for the moment, and one of the main reasons to
move to PG 9) which are then downloaded from my hosting center to our
offices for DR purposes. Each night I pull down roughly 5GB of
compressed pg_dump data. Dumping this takes about 1.5hrs. Downloading
this at 15Mbps takes about an hour. Soon I'll be looking at somewhere
around 7hrs for the dumps to complete and downloading a 12GB file
(which will take about 3 hrs). Oh, and I'll have to pay for
significant bandwidth overage since I'm charged on a 95%, and while an
hour a day does NOT kick me up to 15Mbps usage at 95%, 3hrs per night
certainly will, so there's a real cost associated with this strategy
as well.
While the time of the actual dumps is not a huge issue, the time of
the download IS a large concern, especially since my support folks use
that file daily to extract individual customer databases for restore
in assisting customer support issues.
So, while now I have my pg_dumps completed around 2AM and downloaded
to my local network at about 3AM, with the increase in our database
sizes, what will be happening is that my pg_dump will not be completed
until around 7AM, and the download would not be completed until around
10AM, best-case scenario. Add into that support trying to restore a
database...more on that in a moment.
My _new_ setup will instead be 2 PG 9.x servers with hot-standby
enabled (at my hosting center) and a 3rd PG 9.x server at my local
office also replicating off of the master. Each one of those servers
will perform his own pg_dumps of the individual databases for
backup/disaster recovery purposes, and while each dump might not be
consistent with one another, each SERVER will have dumps consistent to
itself, which is viable for our situation, and does not require me to
download 12GB (or more) each night with all of those associated
nightmares, costs and other problems.
Alright, well, I've got that part all thought out, and it seems like a
good way to do it to me, but I'm _still_ running into the situation
that I've got to take 8hrs-ish to run the pg_dump no matter where it
runs, and when my support folks need it (which they do daily), this
basically means that if they have to have a customer database up NOW
NOW NOW for support reasons, they simply cannot have it within an hour
in many cases. Specifically, one database takes between 2 and 7.5hrs
to pg_dump depending on which format I use, so if they need a CURRENT
copy, they're at least 4 hours out. Additionally, they can't directly
use the replicating server at my local office, because they need to
test the problems the customers are having which include pesky things
like INSERT, UPDATE and DELETE, so they have to restore this data to
another internal PG backend.
Enter my outside-the-box thinking.
I rather assume that you cannot do a start/stop backup on a
hot-standby server. HOWEVER, what if....
I set up a 4th database server internally at my office. Each night I
stop PG on my 3rd server (the local one replicating off of the master)
and rsync my pg_data directory to this new 4th server. I bring up the
4th server NOT as a standby, but as a master. They would then have all
customer data on an internal, usable PG system from the time of the
rsync, and while it might not reflect the immediate state of the
database, that's pretty well always true, and they're used to that,
since whenever they "clone" a site, they're using the dumps done
around midnight anyway.
I believe, then, that when I restart server #3 (the standby who is
replicating), he'll say "oh, geez, I was down, let me catch up on all
that crap that happened while I was out of the loop," he'll replay the
WAL files that were written while he was down, and then he'll catch
back up.
Does this sound like a viable option? Or does someone have additional
suggestions?