Both good points, thanks, although I suspect that a direct network copy of the pg_data directory will be faster than a tar/untar event.
----- Original Message ----- > Hi Scott, > Why you do not replicate this master to the other location/s using > other > methods like bucardo?, you can pick the tables you really want get > replicated there. > For the backup turn to hot backup (tar $PGDATA)+ archiving, easier, > faster and more efficient rather than a logical copy with pgdump. > A.A > On 04/25/2012 09:11 AM, Scott Whitney wrote: > > Hello, everyone. I want to throw a scenario out there to see what > > y'all think. > > > > Soon, my cluster backups will be increasing in size inordinately. > > They're going to immediately go to 3x as large as they currently > > are > > with the potential to be about 20x within a year or so. > > > > My current setup uses a single PG 8.x server doing nightly dumps > > (not > > ideal but sufficient for the moment, and one of the main reasons to > > move to PG 9) which are then downloaded from my hosting center to > > our > > offices for DR purposes. Each night I pull down roughly 5GB of > > compressed pg_dump data. Dumping this takes about 1.5hrs. > > Downloading > > this at 15Mbps takes about an hour. Soon I'll be looking at > > somewhere > > around 7hrs for the dumps to complete and downloading a 12GB file > > (which will take about 3 hrs). Oh, and I'll have to pay for > > significant bandwidth overage since I'm charged on a 95%, and while > > an > > hour a day does NOT kick me up to 15Mbps usage at 95%, 3hrs per > > night > > certainly will, so there's a real cost associated with this > > strategy > > as well. > > > > While the time of the actual dumps is not a huge issue, the time of > > the download IS a large concern, especially since my support folks > > use > > that file daily to extract individual customer databases for > > restore > > in assisting customer support issues. > > > > So, while now I have my pg_dumps completed around 2AM and > > downloaded > > to my local network at about 3AM, with the increase in our database > > sizes, what will be happening is that my pg_dump will not be > > completed > > until around 7AM, and the download would not be completed until > > around > > 10AM, best-case scenario. Add into that support trying to restore a > > database...more on that in a moment. > > > > My _new_ setup will instead be 2 PG 9.x servers with hot-standby > > enabled (at my hosting center) and a 3rd PG 9.x server at my local > > office also replicating off of the master. Each one of those > > servers > > will perform his own pg_dumps of the individual databases for > > backup/disaster recovery purposes, and while each dump might not be > > consistent with one another, each SERVER will have dumps consistent > > to > > itself, which is viable for our situation, and does not require me > > to > > download 12GB (or more) each night with all of those associated > > nightmares, costs and other problems. > > > > Alright, well, I've got that part all thought out, and it seems > > like a > > good way to do it to me, but I'm _still_ running into the situation > > that I've got to take 8hrs-ish to run the pg_dump no matter where > > it > > runs, and when my support folks need it (which they do daily), this > > basically means that if they have to have a customer database up > > NOW > > NOW NOW for support reasons, they simply cannot have it within an > > hour > > in many cases. Specifically, one database takes between 2 and > > 7.5hrs > > to pg_dump depending on which format I use, so if they need a > > CURRENT > > copy, they're at least 4 hours out. Additionally, they can't > > directly > > use the replicating server at my local office, because they need to > > test the problems the customers are having which include pesky > > things > > like INSERT, UPDATE and DELETE, so they have to restore this data > > to > > another internal PG backend. > > > > Enter my outside-the-box thinking. > > > > I rather assume that you cannot do a start/stop backup on a > > hot-standby server. HOWEVER, what if.... > > > > I set up a 4th database server internally at my office. Each night > > I > > stop PG on my 3rd server (the local one replicating off of the > > master) > > and rsync my pg_data directory to this new 4th server. I bring up > > the > > 4th server NOT as a standby, but as a master. They would then have > > all > > customer data on an internal, usable PG system from the time of the > > rsync, and while it might not reflect the immediate state of the > > database, that's pretty well always true, and they're used to that, > > since whenever they "clone" a site, they're using the dumps done > > around midnight anyway. > > > > I believe, then, that when I restart server #3 (the standby who is > > replicating), he'll say "oh, geez, I was down, let me catch up on > > all > > that crap that happened while I was out of the loop," he'll replay > > the > > WAL files that were written while he was down, and then he'll catch > > back up. > > > > Does this sound like a viable option? Or does someone have > > additional > > suggestions?