On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI
> <horiguchi.kyot...@lab.ntt.co.jp> wrote:
>> Oops.
>> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>> <horiguchi.kyot...@lab.ntt.co.jp> wrote in 
>> <20151117.194010.17198448.horiguchi.kyot...@lab.ntt.co.jp>
>>> Hello,
>>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada <sawada.m...@gmail.com> 
>>> wrote in 
>>> <CAD21AoC=an+dkynwsjp6coz-6qmhxxuenxvpisxgpxcuxmp...@mail.gmail.com>
>>> > >> One question is that what is different between the leading "n" in
>>> > >> s_s_names and the leading "n" of "n-priority"?
>>> > >
>>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names
>>> > > representing an arbitrary integer number and that in "n-priority"
>>> > > is literally an "n", meaning "a format with any number of
>>> > > priority hosts" as a whole. As an instance,
>>> > >
>>> > > synchronous_replication_method = "n-priority"
>>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>>> > >
>>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so
>>> > > if we won't provide "1-priority" for backward compatibility,
>>> > > "priority" would be enough to represent the type.
>>> > >
>>> > > By the way, s_r_method is not essentially necessary but it would
>>> > > be important to avoid complexity of autodetection of formats
>>> > > including currently undefined ones.
>>> >
>>> > Than you for your explanation, I understood that.
>>> >
>>> > It means that the format of s_s_names will be changed, which would be not 
>>> > good.
>>> I believe that the format of definition of "replication set"(?)
>>> is not fixed and it would be more complex format to support
>>> nested definition. This should be in very different format from
>>> the current simple list of names. This is a selection among three
>>> or possiblly more disigns in order to be tolerable for future
>>> changes, I suppose.
>>> 1. Additional formats of definition in future will be stored in
>>>    elsewhere of s_s_names.
>>> 2. Additional format will be stored in s_s_names, the format will
>>>    be automatically detected.
>>> 3. (ditto), the format is designated by s_r_method.
>>> 4. Any other way?
>>> I choosed the third way. What do you think about future expansion
>>> of the format?
> I agree with #3 way and the s_s_name format you suggested.
> I think that It's extensible and is tolerable for future changes.
> I'm going to implement the patch based on this idea if other hackers
> agree with this design.

Please find the attached draft patch which supports multi sync replication.
This patch adds a GUC parameter synchronous_replication_method, which
represent the method of synchronous replication.

[Design of replication method]
synchronous_replication_method has two values; 'priority' and
'1-priority' for now.
We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future.

* s_r_method = '1-priority'
This method is for backward compatibility, so the syntax of s_s_names
is same as today.
The behavior is same as well.

* s_r_method = 'priority'
This method is for multiple synchronous replication using priority method.
The syntax of s_s_names is,
   <number of sync standbys>, <standby name> [, ...]

For example, s_r_method = 'priority' and s_s_names = '2, node1, node2,
node3' means that the master waits for  acknowledge from at least 2
lowest priority servers.
If 4 standbys(node1 - node4) are available, the master server waits
acknowledge from 'node1' and 'node2.
The each status of wal senders are;

=# select application_name, sync_state from pg_stat_replication order
by application_name;
application_name | sync_state
node1            | sync
node2            | sync
node3            | potential
node4            | async
(4 rows)

After 'node2' crashed, the master will wait for acknowledge from
'node1' and 'node3'.
The each status of wal senders are;

=# select application_name, sync_state from pg_stat_replication order
by application_name;
application_name | sync_state
node1            | sync
node3            | sync
node4            | async
(3 rows)

[Changing replication method]
When we want to change the replication method, we have to change the
s_r_method  at first, and then do pg_reload_conf().
After changing replication method, we can change the s_s_names.

[Expanding replication method]
If we want to expand new replication method additionally, we need to
implement two functions for each replication method:
* int SyncRepGetSynchronousStandbysXXX(int *sync_standbys)
  This function obtains the list of standbys considered as synchronous
at that time, and return its length.
* bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
  This function obtains LSNs(write, flush) considered as synced.

Also, this patch debug code is remain yet, you can debug this behavior
using by enable DEBUG_REPLICATION macro.

Please give me feedbacks.


Masahiko Sawada

Attachment: 000_multi_sync_replication_v1.patch
Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to