All, Let me clarify and consolidate this discussion. Again, it's my goal that this thread specifically identify only the problems and desired behaviors for synch rep with more than one sync standby. There are several issues with even one sync standby which still remain unresolved, but I believe that we should discuss those on a separate thread, for clarity.
I also strongly believe that we should get single-standby functionality committed and tested *first*, before working further on multi-standby. So, to summarize earlier discussion on this thread: There are 2 reasons to have more than one sync standby: 1) To increase durability above the level of a single synch standby, even at the cost of availability. 2) To increase availability without decreasing durability below the level offered by a single sync standby. The "pure" setup for each of these options, where N is the number of standbys and k is the number of acks required from standbys is: 1) k = N, N > 1, apply 2) k = 1, N > 1, recv (Timeouts are a specific compromise of durability for availability on *one* server, and as such will not be discussed here. BTW, I was the one who suggested a timeout, rather than Simon, so if you don't like the idea, harass me about it.) Any other configuration (3) than the two above is a specific compromise between durability and availability, for example: 3a) k = 2, N = 3, fsync 3b) k = 3, N = 10, recv ... should give you better durability than case 2) and better availability than case 1). While it's nice to dismiss case (1) as an edge-case, consider the likelyhood of someone running PostgreSQL with fsync=off on cloud hosting. In that case, having k = N = 5 does not seem like an unreasonable arrangement if you want to ensure durability via replication. It's what the CAP databases do. After eliminating some of my issues as non-issues, here's what we're left with for problems on the above: (1), (3) Accounting/Registration. Implementing any of these cases would seem to require some form of accounting and/or registration on the master in terms of, at a minimum, the number of acks for each data send. More likely we will need, as proposed on other threads, a register of standbys and the sync state of each. Not only will this accounting/registration be hard code to write, it will have at least *some* performance overhead. Whether that overhead is minority or substantial can only be determined through testing. Further, there's the issue of whether, and how, we transmit this register to the standbys so that they can be promoted. (2), (3) Degradation: (Jeff) these two cases make sense only if we give DBAs the tools they need to monitor which standbys are falling behind, and to drop and replace those standbys. Otherwise we risk giving DBAs false confidence that they have better-than-1-standby reliability when actually they don't. Current tools are not really adequate for this. (1), (3) Dynamic Re-configuration: we need the ability to add and remove standbys at runtime. We also need to have a verdict on how to handle the case where a transaction is pending, per Heikki. (2), (3) Promotion: all multi-standby high-availability cases only make sense if we provide tools to promote the most current standby to be the new master. Otherwise the whole cluster still goes down whenever we have to replace the master. We also should provide some mechanism for promoting an async standby to sync; this has already been discussed. (1) Consistency: this is another DBA-false-confidence issue. DBAs who implement (1) are liable to do so thinking that they are not only guaranteeing the consistency of every standby with the master, but the consistency of every standby with every other standby -- a kind of dummy multi-master. They are not, so it will take multiple reminders and workarounds in the docs to explain this. And we'll get complaints anyway. (1), (2), (3) Initialization: (Dimitri) we need a process whereby a standby can go from cloned to synched to being a sync rep standby, and possibly from degraded to synced again and back. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers