Thank you both for your input...this is indeed the type of analysis I was looking for. Now I have to read and understand them more carefully. At this point I wanted to thank you both and hopefully I can trouble you both with some follow ups in the future.
Cheers Medi On 8/16/07, Chander Ganesan <[EMAIL PROTECTED]> wrote: > > Hannes Dorbath wrote: > > On 15.08.2007 21:30, Medi Montaseri wrote: > >> I am looking for some suggestions to optimize the following > >> problem/solution. > >> > >> Given two nodes A and B (two computers) in a active-passive mode > >> where A is > >> running PG engine, and B is simply standing by, and a common storage > >> (twin tailed) ( or from pg_ctl point of view -D /common/data ), I am > >> looking > >> for a faster solution during the takeover where A has crashed and B > >> is to > >> start PG engine and run with it. > >> > >> My current solution is to start PG engine which should take little > >> time to > >> study the configuration files and /common/data and fork a few > >> childrens. But > >> I am still interested in optimizing this start-up cost. > >> > >> For example, would I gain anything by starting PG engine on both A > >> and B, > >> but on B I point it to /common/dummy and during the takeover, I > >> somehow tell > >> it to now read from /common/data, for example have two > >> postmaster.conf or > >> PGDATA and then use pg_ctl reload. > > > > Starting up PostgreSQL should be very fast, given no recovery to be > > done and decent hardware. > > > > PostgreSQL does not fork a lot unless it is accepting new connections > > and if reading a config file is slow on your system, something else is > > broken. > > > > In a active/passive setup your should be able to switch over in under > > 3 seconds. If there was a lot of load on the failed node the recovery > > times on the new active node increase. The only thing you can do about > > that is getting faster disks.. > > > Your startup time in the event of a failure will be predicated on the > number of WAL files that need to be played back in order to perform > auto-recovery. For example, if you've set your checkpoint_segments to > some high number, PostgreSQL will need to play back those WAL files to > ensure that transactions that were committed to disk are correctly > represented in your "on disk" object data (tables, indexes, etc). > > Note that in the case of a "graceful" shutdown, you'll find that > PostgreSQL doesn't need to replay WAL files (since it checkpoints prior > to shutting down) and as such the startup time is pretty fast. > > You can decrease the amount of time it takes to recover by decreasing > the number of segments per checkpoint; however, this may result in more > frequent checkpointing and as as result a reduction in overall > performance on your active node. > > Basically, you're balancing the "cost" of auto crash recovery with the > frequency of checkpointing. > > An alternative (and perhaps faster) method of failover would be to have > a cluster in "warm standby" mode (this wouldn't rely on a shared pgdata > directory). In such cases you would only have to wait for the last few > WAL files to be played back in order to recover. In the case of a high > checkpoint_segments number this would allow you to recover quickly from > failures, but would introduce PITR overhead (copying WAL files when WAL > files become full), but that's a pretty small cost - and you may already > have that cost if you do PITR backups. > > There is no way to have postgresql "switch" data directories to speed up > startup. > > -- > Chander Ganesan > Open Technology Group, Inc. > One Copley Parkway, Suite 210 > Morrisville, NC 27560 > 919-463-0999/866-229-3386 > http://www.otg-nc.com > >
