Re: [ADMIN] Backup/disaster recovery and bandwidth (long)

amador alvarez Wed, 25 Apr 2012 09:33:24 -0700

Hi Scott,

Why you do not replicate this master to the other location/s using othermethods like bucardo?, you can pick the tables you really want getreplicated there.For the backup turn to hot backup (tar $PGDATA)+ archiving, easier,faster and more efficient rather than a logical copy with pgdump.


A.A

On 04/25/2012 09:11 AM, Scott Whitney wrote:

Hello, everyone. I want to throw a scenario out there to see whaty'all think.
Soon, my cluster backups will be increasing in size inordinately.They're going to immediately go to 3x as large as they currently arewith the potential to be about 20x within a year or so.
My current setup uses a single PG 8.x server doing nightly dumps (notideal but sufficient for the moment, and one of the main reasons tomove to PG 9) which are then downloaded from my hosting center to ouroffices for DR purposes. Each night I pull down roughly 5GB ofcompressed pg_dump data. Dumping this takes about 1.5hrs. Downloadingthis at 15Mbps takes about an hour. Soon I'll be looking at somewherearound 7hrs for the dumps to complete and downloading a 12GB file(which will take about 3 hrs). Oh, and I'll have to pay forsignificant bandwidth overage since I'm charged on a 95%, and while anhour a day does NOT kick me up to 15Mbps usage at 95%, 3hrs per nightcertainly will, so there's a real cost associated with this strategyas well.
While the time of the actual dumps is not a huge issue, the time ofthe download IS a large concern, especially since my support folks usethat file daily to extract individual customer databases for restorein assisting customer support issues.
So, while now I have my pg_dumps completed around 2AM and downloadedto my local network at about 3AM, with the increase in our databasesizes, what will be happening is that my pg_dump will not be completeduntil around 7AM, and the download would not be completed until around10AM, best-case scenario. Add into that support trying to restore adatabase...more on that in a moment.
My _new_ setup will instead be 2 PG 9.x servers with hot-standbyenabled (at my hosting center) and a 3rd PG 9.x server at my localoffice also replicating off of the master. Each one of those serverswill perform his own pg_dumps of the individual databases forbackup/disaster recovery purposes, and while each dump might not beconsistent with one another, each SERVER will have dumps consistent toitself, which is viable for our situation, and does not require me todownload 12GB (or more) each night with all of those associatednightmares, costs and other problems.
Alright, well, I've got that part all thought out, and it seems like agood way to do it to me, but I'm _still_ running into the situationthat I've got to take 8hrs-ish to run the pg_dump no matter where itruns, and when my support folks need it (which they do daily), thisbasically means that if they have to have a customer database up NOWNOW NOW for support reasons, they simply cannot have it within an hourin many cases. Specifically, one database takes between 2 and 7.5hrsto pg_dump depending on which format I use, so if they need a CURRENTcopy, they're at least 4 hours out. Additionally, they can't directlyuse the replicating server at my local office, because they need totest the problems the customers are having which include pesky thingslike INSERT, UPDATE and DELETE, so they have to restore this data toanother internal PG backend.
Enter my outside-the-box thinking.
I rather assume that you cannot do a start/stop backup on ahot-standby server. HOWEVER, what if....
I set up a 4th database server internally at my office. Each night Istop PG on my 3rd server (the local one replicating off of the master)and rsync my pg_data directory to this new 4th server. I bring up the4th server NOT as a standby, but as a master. They would then have allcustomer data on an internal, usable PG system from the time of thersync, and while it might not reflect the immediate state of thedatabase, that's pretty well always true, and they're used to that,since whenever they "clone" a site, they're using the dumps donearound midnight anyway.
I believe, then, that when I restart server #3 (the standby who isreplicating), he'll say "oh, geez, I was down, let me catch up on allthat crap that happened while I was out of the loop," he'll replay theWAL files that were written while he was down, and then he'll catchback up.
Does this sound like a viable option? Or does someone have additionalsuggestions?

Re: [ADMIN] Backup/disaster recovery and bandwidth (long)

Reply via email to