Re: [HACKERS] Support for N synchronous standby servers - take 2

Fujii Masao Thu, 04 Feb 2016 06:35:34 -0800

On Wed, Feb 3, 2016 at 11:00 AM, Robert Haas <[email protected]> wrote:
> On Tue, Feb 2, 2016 at 8:48 PM, Fujii Masao <[email protected]> wrote:
>> So you disagree with only third version that I proposed, i.e.,
>> adding some hooks for sync replication? If yes and you're OK
>> with the first and second versions, ISTM that we almost reached
>> consensus on the direction of multiple sync replication feature.
>> The first version can cover "one local and one remote sync standbys" case,
>> and the second can cover "one local and at least one from several remote
>> standbys" case. I'm thinking to focus on the first version now,
>> and then we can work on the second to support the quorum commit
>
> Well, I think the only hard part of the third problem is deciding on
> what syntax to use.  It seems like a waste of time to me to go to a
> bunch of trouble to implement #1 and #2 using one syntax and then have
> to invent a whole new syntax for #3.  Seriously, this isn't that hard:
> it's not a technical problem.  It's just that we've got a bunch of
> people who can't agree on what syntax to use.  IMO, you should just
> pick something.  You're presumably the committer for this patch, and I
> think you should just decide which of the 47,123 things proposed so
> far is best and insist on that.  I trust that you will make a good
> decision even if it's different than the decision that I would have
> made.


If we use one syntax for every cases, possible approaches that we can choose
are mini-language, json, etc. Since my previous proposal covers only very
simple cases, extra syntax needs to be supported for more complicated cases.
My plan was to add the hooks so that the developers can choose their own
syntax. But which might confuse users.

Now I'm thinking that mini-language is better choice. A json has some good
points, but its big problem is that the setting value is likely to be very long.
For example, when the master needs to wait for one local standby and
at least one from three remote standbys in London data center, the setting
value (synchronous_standby_names) would be

  s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1,
"nodes":["london1", "london2", "london3"]}]}'

OTOH, the value with mini-language is simple and not so long as follows.

  s_s_names = '2[local1, 1(london1, london2, london3)]'

This is why I'm now thinking that mini-language is better. But it's not easy
to completely implement mini-language. There seems to be many problems
that we need to resolve. For example, please imagine the case where
the master needs to wait for at least one from two standbys "tokyo1", "tokyo2"
in Tokyo data center. If Tokyo data center fails, the master needs to
wait for at least one from two standbys "london1", "london2" in London
data center, instead. This case can be configured as follows in mini-language.

  s_s_names = '1[1(tokyo1, tokyo2), 1(london1, london2)]'

One problem here is; what pg_stat_replication.sync_state value should be
shown for each standbys? Which standby should be marked as sync? potential?
any other value like quorum? The current design of pg_stat_replication
doesn't fit complicated sync replication cases, so maybe we need to separate
it into several views. It's almost impossible to complete those problems.

My current plan for 9.6 is to support the minimal subset of mini-language;
simple syntax of "<number>[name, ...]". "<number>" specifies the number of
sync standbys that the master needs to wait for. "[name, ...]" specifies
the priorities of the listed standbys. This first version supports neither
quorum commit nor nested sync replication configuration like
"<number>[name, <number>[name, ...]]". It just supports very simple
"1-level" configuration.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

Reply via email to