Ok, what about the following plan:

1. Support custom WAL records (as far as I know 2ndQuadrant has such patch).
2. Add one more function to logical decoding allowing to deal with custom records.

So the idea is that we somehow record DDL in WAL (for example using executor hook), then them are proceeded using logical decoding, calling special logical deocding plugin function to handle this records. For example we can store DDL in WAL just as SQL statements and so easily replay them.

In this case DDL will be replicated using the same mechanism and through the same channel as DML.

On 17.02.2016 12:16, Craig Ringer wrote:
On 17 February 2016 at 16:24, Konstantin Knizhnik <k.knizh...@postgrespro.ru <mailto:k.knizh...@postgrespro.ru>> wrote:

    Thanks for your explanation. I have to agree with your arguments
    that in general case replication of DDL statement using logical
    decoding seems to be problematic. But we are mostly considering
    logical decoding in quite limited context: replication between two
    identical Postgres database nodes (multimaster).

Yep, much like BDR. Where all this infrastructure came from and is/was aimed at.

    Do you think that it in this case replication of DLL can be done
    as sequence of low level operations with system catalog tables
    including manipulation with locks?


For one thing logical decoding doesn't see catalog tuple changes right now. Though I imagine that could be changed easily enough.

More importantly - oids. You add a column to a table:

ALTER TABLE mytable ADD COLUMN mycolumn some_type UNIQUE NOT NULL DEFAULT some_function()

This writes to catalogs including:

pg_class (for the index relation)

... probably more. It also refers to pg_class (for the definition of mytable), pg_type (definition of some_type), pg_proc (definition of some_function), the b-tree operator class for some_type in pg_opclass, the b-tree indexam in pg_am, ... more.

Everything is linked by oids, and the oids are all node local. You can't just blindly re-use them. If "some_type" is hstore, the oid of hstore in pg_type might be different on the upstream and downstream. The only exception is the oids of built-in types and even then that's not guaranteed across major versions.

So if you blindly replicate catalog row changes you'll get a horrible mess. That's before considering a table's relfilenode, which is initially the same as its oid, but subject to change if truncated or rewritten.

To even begin to do this half-sanely you'd have to maintain a mapping of upstream object oids->names on the downstream, with invalidations replicated from the upstream. That's only the beginning. There's handling of extensions and lots more fun.

    So in your example with ALTER TABLE statement, can we correctly
    replicate it to other nodes
    as request to set exclusive lock + some manipulations with catalog
    tables and data table itself?

Nope. No hope, not unless "some manipulations with catalog tables and data table its self" is a lot more comprehensive than I think you mean.

    1. Add option whether to include operations on system catalog
    tables in logical replication or not.

I would like to have this anyway.

    2. Make it possible to replicate lock requests (can be useful not
    only for DDLs)

I have no idea how you'd even begin to do that.

    I looked how DDL was implemented in BDR and did it in similar way
    in our multimaster.
    But it is awful: we need to have two different channels for
    propagating changes.

Yeah, it's not beautiful, but maybe you misunderstood something? The DDL is written to a table, and that table's changes are replayed along with everything else. It's consistent and keeps DDL changes as part of the xact that performed them. Maybe you misunderstood how it works in BDR and missed the indirection via a table?

    Additionally, in multimaster we want to enforce cluster wide ACID.
    It certainly includes operations with metadata. It will be very
    difficult to implement if replication of DML and DDL is done in
    two different ways...

That's pretty much why BDR does it this way, warts and all. Though it doesn't offer cluster-wide ACID it does need atomic commit of xacts that may contain DML, DDL, or some mix of the two.

    Let me ask one more question concerning logical replication: how
    difficult it will be from your point of view to support two phase
    commit in logical replication? Are there some principle problems?

I haven't looked closely yet. Andres will know more.

I very, very badly want to be able to decode 2PC prepared xacts myself.

The main issue I'm aware of is locking - specifically the inability to impersonate another backend and treat locks held by that backend (which might be a fake backend for a pg_prepared_xacts entry) as held by ourselves for the purpose of being able to access relations, etc.

The work Robert is doing on group locking looks absolutely ideal for this, but won't land before 9.7.

(Closely related, I also want to be able to hook into commit and transform a normal COMMIT into a PREPARE TRANSACTION, <do some stuff>, COMMIT PREPARED with the application that issued the commit none the wiser. This will allow pessimistic 2PC-based conflict handling for must-succeed xacts like those that do DDL).

 Craig Ringer http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply via email to