Re: [HACKERS] pglogical - logical replication contrib module

Konstantin Knizhnik Wed, 17 Feb 2016 00:25:28 -0800

Hi Craig,

Thanks for your explanation. I have to agree with your arguments that ingeneral case replication of DDL statement using logical decoding seemsto be problematic. But we are mostly considering logical decoding inquite limited context: replication between two identical Postgresdatabase nodes (multimaster).

Do you think that it in this case replication of DLL can be done assequence of low level operations with system catalog tablesincluding manipulation with locks? So in your example with ALTER TABLEstatement, can we correctly replicate it to other nodesas request to set exclusive lock + some manipulations with catalogtables and data table itself?

If so, instead of full support of DDL in logical decoding we can only:

1. Add option whether to include operations on system catalog tables inlogical replication or not.2. Make it possible to replicate lock requests (can be useful not onlyfor DDLs)

I looked how DDL was implemented in BDR and did it in similar way in ourmultimaster.But it is awful: we need to have two different channels for propagatingchanges. Additionally, in multimaster we want to enforce cluster wideACID. It certainly includes operations with metadata. It will be verydifficult to implement if replication of DML and DDL is done in twodifferent ways...

Let me ask one more question concerning logical replication: howdifficult it will be from your point of view to support two phase commitin logical replication? Are there some principle problems?


Thanks in advance,
Konstantin




On 17.02.2016 04:33, Craig Ringer wrote:

On 17 February 2016 at 00:54, Oleg Bartunov <obartu...@gmail.com<mailto:obartu...@gmail.com>> wrote:
    DDL support is what it's missed for now.
TBH, based on experience with DDL replication and deparse in BDR, it'sgoing to be missing for a while yet too, or at least notcomprehensively present without caveats or exceptions.
Some DDL operations don't translate well to a series of replicatableactions. The case I hit the most is
ALTER TABLE mytable ADD COLUMN somecolumn sometype NOT NULL DEFAULTsome_function();
This is executed (simplified) by taking an ACCESS EXCLUSIVE lock,changing the catalogs but not making the changes visible yet,rewriting the table, and committing to make the rewritten table andthe catalog changes visible.
That won't work well with logical replication. We currently captureDDL with event triggers and log them to a table for later logicaldecoding and replay - that's the "recognised" way. The trouble beingthat replaying that statement will result in an unnecessary full tablerewrite on the downstream. Then we have to decode and send stream ofchanges to a table called pg_temp_<oid_of_mytable>, truncate the copyof mytable on the downstream that we just rewrote and apply those rowsinstead.
Of course all that only works sensibly if you have exactly oneupstream and the downstream copy of the table is treated as (orenforced as) read-only.
Improving this probably needs DDL deparse to be smarter. Rather thanjust emitting something that can be reconstructed into the SQL text ofthe DDL it needs to emit one or more steps that are semantically thesame but allow us to skip the rewrite. Along the lines of:
* ALTER TABLE mytable ADD COLUMN somecolumn sometype;
* ALTER TABLE mytable ALTER COLUMN somecolumn DEFAULT some_function();
* <wait for rewrite data for mytable>
* ALTER TABLE mytable ALTER COLUMN somecolumn NOT NULL;
Alternately the downstream would need a hook that lets it interceptand prevent table rewrites caused by ALTER TABLE and similar. So itcan instead just do a truncate and wait for the new rows to come fromthe master.
Note that all this means the standby has to hold an ACCESS EXCLUSIVElock on the table during all of replay. That shouldn't be necessary,all we really need is an EXCLUSIVE lock since concurrent SELECTs arefine. No idea how to do that.
Deparse is also just horribly complicated to get right. There are somany clauses and subclauses and variants of statements. Each of whichmust be perfect.
Not everything has a simple and obvious mapping on the downstream sideeither. TRUNCATE ... CASCADE is the obvious one. You do a cascadetruncate on the master - do you want that to replicate as a cascadedtruncate on the replica, or a truncate of only those tables thatactually got truncated on the master? If the replica has additionaltables with FKs pointing at tables replica the TRUNCATE would truncatethose too if you replicate it as CASCADE; if you don't the truncatewill fail instead. Really, both are probably wrong as far as the useris concerned, but we can't truncate just the tables truncated on themaster, ignore the FK relationships, and leave dangling FK referenceseither.
All this means that DDL replication is probably only going to makesense in scenarios where there's exactly one master and the replicaobeys some rules like "don't create FKs pointing from non-replicatedtables to tables replicated from somewhere else". A concept wecurrently have no way to express or enforce like we dopersistent-to-UNLOGGED FKs.
Then there's global objects. Something as simple as:

CREATE ROLE fred;

CREATE TABLE blah(...) OWNER fred;
will break replication because we only see the CREATE TABLE, not theCREATE ROLE. If we instead replayed the CREATE ROLE and there weremultiple connections between different DBs on an upstream anddownstream apply would fail on all but one. But we can't anyway sincethere's no way to capture that CREATE ROLE from any DB except the oneit was executed in, which might not even be one of the ones doingreplication.
I strongly suspect we'll need logical decoding to be made aware ofsuch global DDL and decode it from the WAL writes to the systemcatalogs. Which will be fun - but at least modifications to the sharedcatalogs are a lot simpler than the sort of gymnastics done by ALTERTABLE, etc.
--
 Craig Ringer http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] pglogical - logical replication contrib module

Reply via email to