On Thu, Dec 12, 2013 at 10:49 AM, Andres Freund <and...@2ndquadrant.com> wrote:
>> I hadn't realized that the options were going to be different for
>> logical vs. physical.
> I don't see how we could avoid that, there just are some differences
> between both.

Right, I'm not complaining, just observing that it was a point I had overlooked.

>> So you could
>> not sure whether that's better.
> Not sure either, but I slightly favor keeping the the toplevel slot
> commands the same. I think we'll want one namespace for both and
> possibly similar reporting functions and that seems less surprising if
> they are treated more similar.


> If we were to start out streaming changes before the last running
> transaction has finished, they would be visible in that exported
> snapshot and you couldn't use it to to roll forward from anymore.

Actually, you could.  You'd just have to throw away any transactions
whose XIDs are visible to the exported snapshot.  In other words, you
begin replication at time T0, and all transactions which begin after
that time are included in the change stream.  At some later time T1,
all transactions in progress at time T0 have ended, and now you can
export a snapshot at that time, or any later time, from which you can
roll forward.  Any change-stream entries for XIDs which would be
visible to that snapshot shouldn't be replayed when rolling forward
from it, though.

I think it sucks (that's the technical term) to have to wait for all
currently-running transactions to terminate before being able to begin
streaming changes, because that could take a long time.  And you might
well know that the long-running transaction which is rolling up
enormous table A that you don't care about is never going to touch
table B which you actually want to replicate.  Now, ideally, the DBA
would have a way to ignore that long-running transaction and force
replication to start, perhaps with the caveat that if that
long-running transaction actually does touch B after all then we have
to resync.  Your model's fine when we want to replicate the whole
database, but a big part of why I want this feature is to allow
finer-grained replication, down to the table level, or even slices of

So imagine this.  After initiating logical replication, a replication
solution either briefly x-locks a table it wants to replicate, so that
there can't be anyone else touching it, or it observes who has a lock
>= RowExclusiveLock and waits for all of those locks to drop away.  At
that point, it knows that no currently-in-progress transaction can
have modified the table prior to the start of replication, and begins
copying the table.  If a transaction that began before the start of
replication subsequently modifies the table, a WAL record will be
written, and the core logical decoding support could let the plugin
know by means of an optional callback (hey, btw, a change I can't
decode just hit table XYZ).  The plugin will need to respond by
recopying the table, which sucks, but it was the plugin's decision to
be optimistic in the first place, and that will in many cases be a
valid policy decision.  If no such callback arrives before the
safe-snapshot point, then the plugin made the right bet and will reap
the just rewards of its optimism.

>> I don't have a problem with the behavior.  Seems useful.  One useful
>> addition might be to provide an option to stream out up to X changes
>> but without consuming them, so that the DBA can peek at the
>> replication stream.  I think it's a safe bet DBAs will want to do
>> things like that, so it'd be nice to make it easy, if we can.
> It's not too difficult to provide an option to do that. What I've been
> thinking of was to correlate the confirmation of consumption with the
> transaction the SRF is running in. So, confirm the data as consumed if
> it commits, and don't if not. I think we could do that relatively easily
> by registering a XACT_EVENT_COMMIT.

That's a bit too accident-prone for my taste.  I'd rather the DBA had
some equivalent of peek_at_replication(nchanges int).

>> Sounds about right, but I think we need to get religion about figuring
>> out what terminology to use.  At the moment it seems to vary quite a
>> bit between "logical", "logical decoding", and "decoding".  Not sure
>> how to nail that down.
> Agreed. Perhaps we should just avoid both logical and decoding entirely
> and go for "changestream" or similar?

So wal_level=changestream?  Not feeling it.  Of course we don't have
to be 100% rigid about this but we should try to make our terminology
corresponding with natural semantic boundaries.  Maybe we should call
the process logical decoding, and the results logical streams, or
something like that.

>> As a more abstract linguistic question, what do we think the
>> difference is between logical *replication* and logical *decoding*?
>> Are they the same or different?  If different, how?
> For me "logical decoding" can be the basis of "logical replication", but
> also for other features.

Such as?

>> > I wonder if we should let the output plugin tell us whether it will
>> > output data in binary? I think it generally would be a good idea to let
>> > the output plugin's _init() function return some configuration
>> > data. That will make extending the interface to support more features
>> > easier.
>> Maybe, but you've got to consider the question of encoding, too.  You
>> could make the choices "binary" and "the database encoding", I
>> suppose.
> Yes, I think that should be the choice. There seems little justification
> for an output plugin to produce textual output in anything but the
> server encoding.
> I am not sure if we want to verify that in !USE_ASSERT? That'd be quite
> expensive...

Since it's all C code anyway, it's probably fine to push the
responsibility back onto the output plugin, as long as that's a
documented part of the API contract.

>> > As far as I have been thinking of, this would be another catalog table like
>> > pg_decoding_plugin(oid, dpname name, dpload regproc).
>> Instead of adding another catalog table, I think we should just define
>> a new type.  Again, please look at the way that foreign data wrappers
>> do this:
> I don't really see what the usage of a special type has to do with this,
> but I think that's besides your main point. What you're saying is that
> the output plugin is just defined by a function name, possibly schema
> prefixed. That has an elegance to it. +1

Well, file_fdw_handler returns type fdw_handler.  That's nice, because
we can validate that we've got the right sort of object when what we
want is an FDW handler.  If it just returned type internal, it would
be too easy to mix it up with something unrelated that passed back
some other kind of binary goop.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to