Re: [HACKERS] Issues with Quorum Commit

Dimitri Fontaine Thu, 07 Oct 2010 03:33:22 -0700

Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes:
> Either that, or you configure your system for asynchronous replication
> first, and flip the switch to synchronous only after the standby has caught
> up. Setting up the first standby happens only once when you initially set up
> the system, or if you're recovering from a catastrophic loss of the
> standby.


Or if the standby is lagging and the master wal_keep_segments is not
sized big enough. Is that a catastrophic loss of the standby too?

>> It's all about the standard case you're building, sync rep, and how to
>> manage errors. In most cases I want flexibility. Alert says standby is
>> down, you lost your durability requirements, so now I'm building a new
>> standby. Does it mean my applications are all off and the master
>> refusing to work?
>
> Yes. That's why you want to have at least two standbys if you care about
> availability. Or if durability isn't that important to you after all, use
> asynchronous replication.

Agreed, that's a nice simple use case.

Another one is to say that I want sync rep when the standby is
available, but I don't have the budget for more. So I prefer a good
alerting system and low-budget-no-guarantee when the standby is down,
that's my risk evaluation.

> Of course, if in the heat of the moment the admin is willing to forge ahead
> without the standby, he can temporarily change the configuration in the
> master. If you want the standby to be rebuilt automatically, you can even
> incorporate that configuration change in the scripts too. The important
> point is that you or your scripts are in control, and you know at all times
> whether you can trust the standby or not. If the master makes such decisions
> automatically, you don't know if the standby is trustworthy (ie. guaranteed
> up-to-date) or not.

My proposal is that the master has the information to make the decision,
and the behavior is something you setup. Default to security, so wait
forever and block the applications, but could be set to ignore standby
that have not at least reached this state.

I don't see that you can make everybody happy without a knob here, and I
don't see how we can deliver one without a clear state diagram of the
standby possible current states and transitions.

The other alternative is to just don't care and accept the timeout as
being an option with the quorum, so that you just don't wait for the
quorum if so you want. It's much more dynamic and dangerous, but with a
good alerting system it'll be very popular I guess.

> I don't see anything wrong with having tools for admins to deal with the
> unexpected. I'm not sure overriding individual transactions is very useful
> though, more likely you'll want to take the whole server offline, or you
> want to change the config to allow all transactions to continue without the
> synchronous standby.

The question then is, should the new configuration alter running
transactions? My implicit was that I don't think so, and then I need
another facility, such as

  SELECT pg_cancel_quorum_wait(procpid)
    FROM pg_stat_activity
   WHERE waiting_quorum;

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Reply via email to