On Thu, Dec 12, 2013 at 10:49 AM, Andres Freund <and...@2ndquadrant.com> wrote: >> I hadn't realized that the options were going to be different for >> logical vs. physical. > > I don't see how we could avoid that, there just are some differences > between both.
Right, I'm not complaining, just observing that it was a point I had overlooked. >> So you could >> also do ACQUIRE_LOGICAL_SLOT, ACQUIRE_PHYSICAL_SLOT, >> START_REPLICATION, START_LOGICAL_REPLICATION, and RELEASE_SLOT. I'm >> not sure whether that's better. > > Not sure either, but I slightly favor keeping the the toplevel slot > commands the same. I think we'll want one namespace for both and > possibly similar reporting functions and that seems less surprising if > they are treated more similar. OK. > If we were to start out streaming changes before the last running > transaction has finished, they would be visible in that exported > snapshot and you couldn't use it to to roll forward from anymore. Actually, you could. You'd just have to throw away any transactions whose XIDs are visible to the exported snapshot. In other words, you begin replication at time T0, and all transactions which begin after that time are included in the change stream. At some later time T1, all transactions in progress at time T0 have ended, and now you can export a snapshot at that time, or any later time, from which you can roll forward. Any change-stream entries for XIDs which would be visible to that snapshot shouldn't be replayed when rolling forward from it, though. I think it sucks (that's the technical term) to have to wait for all currently-running transactions to terminate before being able to begin streaming changes, because that could take a long time. And you might well know that the long-running transaction which is rolling up enormous table A that you don't care about is never going to touch table B which you actually want to replicate. Now, ideally, the DBA would have a way to ignore that long-running transaction and force replication to start, perhaps with the caveat that if that long-running transaction actually does touch B after all then we have to resync. Your model's fine when we want to replicate the whole database, but a big part of why I want this feature is to allow finer-grained replication, down to the table level, or even slices of tables. So imagine this. After initiating logical replication, a replication solution either briefly x-locks a table it wants to replicate, so that there can't be anyone else touching it, or it observes who has a lock >= RowExclusiveLock and waits for all of those locks to drop away. At that point, it knows that no currently-in-progress transaction can have modified the table prior to the start of replication, and begins copying the table. If a transaction that began before the start of replication subsequently modifies the table, a WAL record will be written, and the core logical decoding support could let the plugin know by means of an optional callback (hey, btw, a change I can't decode just hit table XYZ). The plugin will need to respond by recopying the table, which sucks, but it was the plugin's decision to be optimistic in the first place, and that will in many cases be a valid policy decision. If no such callback arrives before the safe-snapshot point, then the plugin made the right bet and will reap the just rewards of its optimism. >> I don't have a problem with the behavior. Seems useful. One useful >> addition might be to provide an option to stream out up to X changes >> but without consuming them, so that the DBA can peek at the >> replication stream. I think it's a safe bet DBAs will want to do >> things like that, so it'd be nice to make it easy, if we can. > > It's not too difficult to provide an option to do that. What I've been > thinking of was to correlate the confirmation of consumption with the > transaction the SRF is running in. So, confirm the data as consumed if > it commits, and don't if not. I think we could do that relatively easily > by registering a XACT_EVENT_COMMIT. That's a bit too accident-prone for my taste. I'd rather the DBA had some equivalent of peek_at_replication(nchanges int). >> Sounds about right, but I think we need to get religion about figuring >> out what terminology to use. At the moment it seems to vary quite a >> bit between "logical", "logical decoding", and "decoding". Not sure >> how to nail that down. > > Agreed. Perhaps we should just avoid both logical and decoding entirely > and go for "changestream" or similar? So wal_level=changestream? Not feeling it. Of course we don't have to be 100% rigid about this but we should try to make our terminology corresponding with natural semantic boundaries. Maybe we should call the process logical decoding, and the results logical streams, or something like that. >> As a more abstract linguistic question, what do we think the >> difference is between logical *replication* and logical *decoding*? >> Are they the same or different? If different, how? > > For me "logical decoding" can be the basis of "logical replication", but > also for other features. Such as? >> > I wonder if we should let the output plugin tell us whether it will >> > output data in binary? I think it generally would be a good idea to let >> > the output plugin's _init() function return some configuration >> > data. That will make extending the interface to support more features >> > easier. >> >> Maybe, but you've got to consider the question of encoding, too. You >> could make the choices "binary" and "the database encoding", I >> suppose. > > Yes, I think that should be the choice. There seems little justification > for an output plugin to produce textual output in anything but the > server encoding. > > I am not sure if we want to verify that in !USE_ASSERT? That'd be quite > expensive... Since it's all C code anyway, it's probably fine to push the responsibility back onto the output plugin, as long as that's a documented part of the API contract. >> > As far as I have been thinking of, this would be another catalog table like >> > pg_decoding_plugin(oid, dpname name, dpload regproc). >> >> Instead of adding another catalog table, I think we should just define >> a new type. Again, please look at the way that foreign data wrappers >> do this: > > I don't really see what the usage of a special type has to do with this, > but I think that's besides your main point. What you're saying is that > the output plugin is just defined by a function name, possibly schema > prefixed. That has an elegance to it. +1 Well, file_fdw_handler returns type fdw_handler. That's nice, because we can validate that we've got the right sort of object when what we want is an FDW handler. If it just returned type internal, it would be too easy to mix it up with something unrelated that passed back some other kind of binary goop. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers