Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes: > Either that, or you configure your system for asynchronous replication > first, and flip the switch to synchronous only after the standby has caught > up. Setting up the first standby happens only once when you initially set up > the system, or if you're recovering from a catastrophic loss of the > standby.
Or if the standby is lagging and the master wal_keep_segments is not sized big enough. Is that a catastrophic loss of the standby too? >> It's all about the standard case you're building, sync rep, and how to >> manage errors. In most cases I want flexibility. Alert says standby is >> down, you lost your durability requirements, so now I'm building a new >> standby. Does it mean my applications are all off and the master >> refusing to work? > > Yes. That's why you want to have at least two standbys if you care about > availability. Or if durability isn't that important to you after all, use > asynchronous replication. Agreed, that's a nice simple use case. Another one is to say that I want sync rep when the standby is available, but I don't have the budget for more. So I prefer a good alerting system and low-budget-no-guarantee when the standby is down, that's my risk evaluation. > Of course, if in the heat of the moment the admin is willing to forge ahead > without the standby, he can temporarily change the configuration in the > master. If you want the standby to be rebuilt automatically, you can even > incorporate that configuration change in the scripts too. The important > point is that you or your scripts are in control, and you know at all times > whether you can trust the standby or not. If the master makes such decisions > automatically, you don't know if the standby is trustworthy (ie. guaranteed > up-to-date) or not. My proposal is that the master has the information to make the decision, and the behavior is something you setup. Default to security, so wait forever and block the applications, but could be set to ignore standby that have not at least reached this state. I don't see that you can make everybody happy without a knob here, and I don't see how we can deliver one without a clear state diagram of the standby possible current states and transitions. The other alternative is to just don't care and accept the timeout as being an option with the quorum, so that you just don't wait for the quorum if so you want. It's much more dynamic and dangerous, but with a good alerting system it'll be very popular I guess. > I don't see anything wrong with having tools for admins to deal with the > unexpected. I'm not sure overriding individual transactions is very useful > though, more likely you'll want to take the whole server offline, or you > want to change the config to allow all transactions to continue without the > synchronous standby. The question then is, should the new configuration alter running transactions? My implicit was that I don't think so, and then I need another facility, such as SELECT pg_cancel_quorum_wait(procpid) FROM pg_stat_activity WHERE waiting_quorum; Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers