On Fri, Jul 8, 2016 at 5:47 AM, Craig Ringer <cr...@2ndquadrant.com> wrote: >> DDL is our standard way of getting things into the system catalogs. >> We have no system catalog metadata that is intended to be populated by >> any means other than DDL. > > Replication slots? (Arguably not catalogs, I guess) > > Replication origins?
Those things aren't catalogs, are they? I mean, as I said in the other email I just sent in reply to Simon, if you did a pg_dump and a pg_restore, I don't think it would be useful to preserve replication slot LSNs afterwards. If I'm wrong, and that is a useful thing to do, then we should have a pg_dump flag to do it. Either way, I think we do have some work to do figuring out how you can dump, restore, and then resume logical replication, probably by establishing a new slot and then incrementally resynchronizing without having to copy unchanged rows. That having been said, I think the choice not to use DDL for slots was somewhat unfortunate. We now have CREATE_REPLICATION_SLOT that can be used via the replication protocol but there is no corresponding CREATE REPLICATION SLOT for the regular protocol; I think that's kinda strange. >> If you want to add a column to a table, you >> say ALTER TABLE .. ADD COLUMN. If you want to add a column to an >> extension, you say ALTER EXTENSION .. ADD TABLE. If you want to add >> an option to a foreign table, you say ALTER FOREIGN TABLE .. OPTIONS >> (ADD ..). Therefore, I think it is entirely reasonable and obviously >> consistent with existing practice that if you want to add a table to a >> replication set, you should write ALTER REPLICATION SET .. ADD TABLE. >> I don't understand why logical replication should be the one feature >> that departs from the way that all of our other features work. > > Because unlike all the other features, it can work usefully *across > versions*. So what? > We have no extension points for DDL. > > For function interfaces, we do. > > That, alone, makes a function based interface overwhelmingly compelling > unless there are specific things we *cannot reasonably do* without DDL. I don't understand this. We add new DDL in new releases, and we avoid changing the meaning existing of DDL. Using function interfaces won't make it possible to change the meaning of existing syntax, and it won't make it any more possible to add new syntax. It will just make replication commands be spelled differently from everything else. > In many cases it's actively undesirable to dump and restore logical > replication state. Most, I'd say. There probably are cases where it's > desirable to retain logical replication state such that restoring a dump > resumes replication, but I challenge you to come up with any sensible and > sane way that can actually be implemented. Especially since you must > obviously consider the possibility of both upstream and downstream being > restored from dumps. Yes, these issues need lots of thought, but I think that replication set definitions, at least, are sensible to dump and reload. > IMO the problem mostly devolves to making sure dumps taken of different DBs > are consistent so new replication sessions can be established safely. And > really, I think it's a separate feature to logical replication its self. I think what is needed has more to do with coping with the situation when the snapshots aren't consistent. Having a way to make sure they are consistent is a great idea, but there WILL be situations when replication between two 10TB databases gets broken and it will not be good if the only way to recover is to reclone. > To what extent are you approaching this from the PoV of wanting to use this > in FDW sharding? It's unclear what vision for users you have behind the > things you say must be done, and I'd like to try to move to more concrete > ground. You want DDL? OK, what should it look like? What does it add over a > function based interface? What's cluster-wide and what's DB-local? etc. I've thought about that question, a little bit, but it's not really what underlies my concerns here. I'm concerned about dump-and-restore preserving as much state as is usefully possible, because I think that's critical for the user experience, and I'm concerned with having the commands we use to manage replication not be spelled totally differently than our other commands. However, as far as sharding is concerned, no matter how it gets implemented, I think logical replication is a key feature. Postgres-XC/XL has the idea of "replicated" tables which are present on every data node, and that's very important for efficient implementation of joins. If you do a join between a little table and a big sharded table, you want to be able to push that down to the shards, and you can only do that if the entirety of the little table is present on every shard or by creating a temporary copy on every shard. In many cases, the former will be preferable. So, think it's important for sharding that logical replication is fully integrated into core in such a manner as to be available as a building block for other features. At the least, I'm guessing that we'll want a way for whatever code is planning join execution to figure out which tables have up-to-date copies on servers that are involved in the query. As far as the FDW-based approach to sharding is concerned, one thing to think about is whether postgres_fdw and logical replication could share one notion of where the remote servers are. > FWIW, Petr is working on some code in the area, but I don't know how far > along the work is. OK, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers