Re: [HACKERS] Dealing with logical replication

2017-07-20 Thread Tatsuo Ishii
> On 19 July 2017 at 16:34, Tatsuo Ishii  wrote:
> 
>> Now that we are going to have logical replication in PostgreSQL 10, I
>> have started thinking how Pgpool-II can deal with it. For example, the
>> logical replication does not replicate DDLs. Isn't it convenient for
>> users to do it automatically in Pgpool-II? Or even doing it for
>> TRUNCATE?
>>
>> Or are they against the design philosophy of the logical replication?
>>
> 
> (Disclaimer - Petr Jelinek and Peter Eisentraut were the main ones working
> on in in-core logical rep, and I haven't had time to play with it much).
> 
> TL;DR: a pooler can only solve a limited subset of the problem, fairly
> unreliably, and this is really something that needs further work in core.
> 
> Not replicating TRUNCATE and schema changes is more an implementation
> limitation than anything else. It's desirable to get to the point where
> logical replication can transparently replicate DDL. There are some hurdles
> for doing so, but it'll be possible to get something in place that's good
> enough when time/release progress permits.
> 
> Similarly, with TRUNCATE, AFAIK support just didn't make the cut for pg10.
> 
> A pooler could well help in the mean time, but you have to consider
> ordering with care. For example, given "U" upstream, "D" downstream:
> 
> U: CREATE TABLE x (a integer, b integer);
> D: CREATE TABLE x (a); -- user script syncs
> U: INSERT INTO x (a,b) VALUES (1,0);
> D: [applies INSERT 1,0]
> U: INSERT INTO x (a,b) VALUES (2,0);
> U: ALTER TABLE x DROP COLUMN b;
> D: ALTER TABLE x DROP COLUMN b; -- user script syncs
> U: INSERT INTO x (a) VALUES (3);
> D: [ERROR on INSERT of 2,0: no column 'b']
> 
> Because the DDL here is transported out of band vs the row data, you can
> easily create situations where the schema change is applied before the last
> committed-but-not-yet-replicated data from the upstream that was based on
> the old schema.

What I am thinking for now is, do not allow to issue DDLs on the
downstream.

> To achieve correct ordering, the simplest approach is to record DDL in a
> table when you perform it on the upstream, and replay it when you see rows
> in that table appear on the downstream.  You know it's safe to replay it
> now. This is the essence of what BDR and pglogical do with their DDL
> replication, but they handle DDL in the middle of transactions that also
> make row data changes by intercepting writes to the queue table and
> performing the DDL at the exact point in the transaction where it happened
> on the upstream. I don't think that's possible with the in-core logical
> replication yet, and certainly not something a pooler can do.
> 
> To do it externally, you have to take note of when a schema change happened
> on the upstream and apply it on the downstream at or after the point where
> the downstream has replayed and confirmed up to the upstream lsn where the
> schema change happened. Then apply the schema change.
> 
> A pooler trying to help here must also be very aware of the impact of
> multi-statements. If you send a single simple query message with a mixture
> of schema change commands and normal DML, you probably don't want to repeat
> the DML on downstream nodes or you'll get duplicate rows etc. But ... by
> unless it embeds transaction control commands, a simple query message
> executes in a single implicit transaction, so if you extract just the DDL
> you'll again have ordering issues of upstream vs downstream.

Pgpool-II will avoid the problem by rejecting such multi-statements.

> There are even a variety of difficulties to overcome with doing it in core:
> event triggers don't capture ddl command text and have no facility to turn
> the internal command representation back into SQL command text, nor do we
> have any way to turn the internal representation back into a parsenode tree
> for execution on a downstream's standard_ProcessUtility. However, we can
> capture raw command text with ProcessUtility_hook now that we have
> byte-ranges for the query-parts of a multi-part query (yay!), and that
> works well enough if you also capture the active search_path and apply with
> the same search_path. It can match the wrong object if extra objects with
> the same name are present earlier on the search_path on the downstream than
> on the upstream, so it's not ideal, but that's a weird corner case.
> 
> If we had a hook in the logical apply worker's insert or wal-message
> routines it'd be possible to write an extension to do this for pg10, but
> AFAICS we don't.
> 
> So schema changes in logical replication currently require more care than
> in physical replication.

Thank you for the explanation regarding those difficulties.

I now understand that the reason why logical replication does not
support DDLs and TRUNCATE. The reason is, not by design decision but
by the difficulties of implementation.

Probably that means those limitations will be solved in the
future. Then the motivation for 

Re: [HACKERS] Dealing with logical replication

2017-07-19 Thread Craig Ringer
On 19 July 2017 at 16:34, Tatsuo Ishii  wrote:

> Now that we are going to have logical replication in PostgreSQL 10, I
> have started thinking how Pgpool-II can deal with it. For example, the
> logical replication does not replicate DDLs. Isn't it convenient for
> users to do it automatically in Pgpool-II? Or even doing it for
> TRUNCATE?
>
> Or are they against the design philosophy of the logical replication?
>

(Disclaimer - Petr Jelinek and Peter Eisentraut were the main ones working
on in in-core logical rep, and I haven't had time to play with it much).

TL;DR: a pooler can only solve a limited subset of the problem, fairly
unreliably, and this is really something that needs further work in core.

Not replicating TRUNCATE and schema changes is more an implementation
limitation than anything else. It's desirable to get to the point where
logical replication can transparently replicate DDL. There are some hurdles
for doing so, but it'll be possible to get something in place that's good
enough when time/release progress permits.

Similarly, with TRUNCATE, AFAIK support just didn't make the cut for pg10.

A pooler could well help in the mean time, but you have to consider
ordering with care. For example, given "U" upstream, "D" downstream:

U: CREATE TABLE x (a integer, b integer);
D: CREATE TABLE x (a); -- user script syncs
U: INSERT INTO x (a,b) VALUES (1,0);
D: [applies INSERT 1,0]
U: INSERT INTO x (a,b) VALUES (2,0);
U: ALTER TABLE x DROP COLUMN b;
D: ALTER TABLE x DROP COLUMN b; -- user script syncs
U: INSERT INTO x (a) VALUES (3);
D: [ERROR on INSERT of 2,0: no column 'b']

Because the DDL here is transported out of band vs the row data, you can
easily create situations where the schema change is applied before the last
committed-but-not-yet-replicated data from the upstream that was based on
the old schema.

To achieve correct ordering, the simplest approach is to record DDL in a
table when you perform it on the upstream, and replay it when you see rows
in that table appear on the downstream.  You know it's safe to replay it
now. This is the essence of what BDR and pglogical do with their DDL
replication, but they handle DDL in the middle of transactions that also
make row data changes by intercepting writes to the queue table and
performing the DDL at the exact point in the transaction where it happened
on the upstream. I don't think that's possible with the in-core logical
replication yet, and certainly not something a pooler can do.

To do it externally, you have to take note of when a schema change happened
on the upstream and apply it on the downstream at or after the point where
the downstream has replayed and confirmed up to the upstream lsn where the
schema change happened. Then apply the schema change.

A pooler trying to help here must also be very aware of the impact of
multi-statements. If you send a single simple query message with a mixture
of schema change commands and normal DML, you probably don't want to repeat
the DML on downstream nodes or you'll get duplicate rows etc. But ... by
unless it embeds transaction control commands, a simple query message
executes in a single implicit transaction, so if you extract just the DDL
you'll again have ordering issues of upstream vs downstream.

There are even a variety of difficulties to overcome with doing it in core:
event triggers don't capture ddl command text and have no facility to turn
the internal command representation back into SQL command text, nor do we
have any way to turn the internal representation back into a parsenode tree
for execution on a downstream's standard_ProcessUtility. However, we can
capture raw command text with ProcessUtility_hook now that we have
byte-ranges for the query-parts of a multi-part query (yay!), and that
works well enough if you also capture the active search_path and apply with
the same search_path. It can match the wrong object if extra objects with
the same name are present earlier on the search_path on the downstream than
on the upstream, so it's not ideal, but that's a weird corner case.

If we had a hook in the logical apply worker's insert or wal-message
routines it'd be possible to write an extension to do this for pg10, but
AFAICS we don't.

So schema changes in logical replication currently require more care than
in physical replication.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Dealing with logical replication

2017-07-19 Thread Tatsuo Ishii
Now that we are going to have logical replication in PostgreSQL 10, I
have started thinking how Pgpool-II can deal with it. For example, the
logical replication does not replicate DDLs. Isn't it convenient for
users to do it automatically in Pgpool-II? Or even doing it for
TRUNCATE?

Or are they against the design philosophy of the logical replication?

Comments are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers