On 17 February 2016 at 00:54, Oleg Bartunov <obartu...@gmail.com> wrote:
> DDL support is what it's missed for now.
TBH, based on experience with DDL replication and deparse in BDR, it's
going to be missing for a while yet too, or at least not comprehensively
present without caveats or exceptions.
Some DDL operations don't translate well to a series of replicatable
actions. The case I hit the most is
ALTER TABLE mytable ADD COLUMN somecolumn sometype NOT NULL DEFAULT
This is executed (simplified) by taking an ACCESS EXCLUSIVE lock, changing
the catalogs but not making the changes visible yet, rewriting the table,
and committing to make the rewritten table and the catalog changes visible.
That won't work well with logical replication. We currently capture DDL
with event triggers and log them to a table for later logical decoding and
replay - that's the "recognised" way. The trouble being that replaying that
statement will result in an unnecessary full table rewrite on the
downstream. Then we have to decode and send stream of changes to a table
called pg_temp_<oid_of_mytable>, truncate the copy of mytable on the
downstream that we just rewrote and apply those rows instead.
Of course all that only works sensibly if you have exactly one upstream and
the downstream copy of the table is treated as (or enforced as) read-only.
Improving this probably needs DDL deparse to be smarter. Rather than just
emitting something that can be reconstructed into the SQL text of the DDL
it needs to emit one or more steps that are semantically the same but allow
us to skip the rewrite. Along the lines of:
* ALTER TABLE mytable ADD COLUMN somecolumn sometype;
* ALTER TABLE mytable ALTER COLUMN somecolumn DEFAULT some_function();
* <wait for rewrite data for mytable>
* ALTER TABLE mytable ALTER COLUMN somecolumn NOT NULL;
Alternately the downstream would need a hook that lets it intercept and
prevent table rewrites caused by ALTER TABLE and similar. So it can instead
just do a truncate and wait for the new rows to come from the master.
Note that all this means the standby has to hold an ACCESS EXCLUSIVE lock
on the table during all of replay. That shouldn't be necessary, all we
really need is an EXCLUSIVE lock since concurrent SELECTs are fine. No idea
how to do that.
Deparse is also just horribly complicated to get right. There are so many
clauses and subclauses and variants of statements. Each of which must be
Not everything has a simple and obvious mapping on the downstream side
either. TRUNCATE ... CASCADE is the obvious one. You do a cascade truncate
on the master - do you want that to replicate as a cascaded truncate on the
replica, or a truncate of only those tables that actually got truncated on
the master? If the replica has additional tables with FKs pointing at
tables replica the TRUNCATE would truncate those too if you replicate it as
CASCADE; if you don't the truncate will fail instead. Really, both are
probably wrong as far as the user is concerned, but we can't truncate just
the tables truncated on the master, ignore the FK relationships, and leave
dangling FK references either.
All this means that DDL replication is probably only going to make sense in
scenarios where there's exactly one master and the replica obeys some rules
like "don't create FKs pointing from non-replicated tables to tables
replicated from somewhere else". A concept we currently have no way to
express or enforce like we do persistent-to-UNLOGGED FKs.
Then there's global objects. Something as simple as:
CREATE ROLE fred;
CREATE TABLE blah(...) OWNER fred;
will break replication because we only see the CREATE TABLE, not the CREATE
ROLE. If we instead replayed the CREATE ROLE and there were multiple
connections between different DBs on an upstream and downstream apply would
fail on all but one. But we can't anyway since there's no way to capture
that CREATE ROLE from any DB except the one it was executed in, which might
not even be one of the ones doing replication.
I strongly suspect we'll need logical decoding to be made aware of such
global DDL and decode it from the WAL writes to the system catalogs. Which
will be fun - but at least modifications to the shared catalogs are a lot
simpler than the sort of gymnastics done by ALTER TABLE, etc.
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services