Re: [HACKERS] Changeset Extraction Interfaces

Andres Freund Thu, 12 Dec 2013 07:50:58 -0800

On 2013-12-12 10:01:21 -0500, Robert Haas wrote:
> On Thu, Dec 12, 2013 at 7:04 AM, Andres Freund <and...@2ndquadrant.com> wrote:
> > I think there'll always be a bit of a difference between slots for
> > physical and logical data, even if 90% of the implementation is the
> > same. We can signal that difference by specifying logical/physical as an
> > option or having two different sets of commands.
> >
> > Maybe?
> >
> > ACQUIRE_REPLICATION_SLOT slot_name PHYSICAL physical_opts
> > ACQUIRE_REPLICATION_SLOT slot_name LOGICAL logical_opts
> > -- already exists without slot, PHYSICAL arguments
> > START_REPLICATION [SLOT slot] [PHYSICAL] RECPTR opt_timeline
> > START_REPLICATION SLOT LOGICAL slot plugin_options
> > RELEASE_REPLICATION_SLOT slot_name
> 
> I assume you meant START_REPLICATION SLOT slot LOGICAL plugin_options,
> but basically this seems OK to me.


Uh, yes.

> I hadn't realized that the options were going to be different for
> logical vs. physical.

I don't see how we could avoid that, there just are some differences
between both.

> So you could
> also do ACQUIRE_LOGICAL_SLOT, ACQUIRE_PHYSICAL_SLOT,
> START_REPLICATION, START_LOGICAL_REPLICATION, and RELEASE_SLOT.  I'm
> not sure whether that's better.

Not sure either, but I slightly favor keeping the the toplevel slot
commands the same. I think we'll want one namespace for both and
possibly similar reporting functions and that seems less surprising if
they are treated more similar.

> > So what you could get is something that starts streaming you changes
> > sometime after you asked it to start streaming, without a guarantee that
> > you can restart at exactly the position you stopped. If that's useful,
> > we can do it, but I am not sure what the usecase would be?
> 
> I haven't yet looked closely at the snapshot-building stuff, but my
> thought is that you ought to be able to decode any transactions that
> start after you make the connection.  You might not be able to decode
> transactions that are already in progress at that point, because you
> might have already missed XID assignment records, catalog changes,
> etc. that they've performed.  But transactions that begin after that
> point ought to be OK.

It works mostly like that, yes. At least on primaries. When we start
decoding, we jot down the current xlog insertion pointer to know where
to start decoding from, then trigger a xl_running_xacts record to be
logged so we have enough information. Then we start reading from that
point onwards. On standbys the process is the same, just that we have to
wait for the primary to issue a xl_running_xacts.
(I had considered starting with information from the procarray, but
turns out that's hard to do without race conditions.)

We only decode changes in transactions that commit after the last
transaction that was in-progress when we started observing has finished
though. That allows us to export a snapshot when the last still-running
transaction finished which shows a snapshot of the database that can be
rolled forward exactly by the changes contained in the changestream. I
think that's a useful property for the majority of cases.

If we were to start out streaming changes before the last running
transaction has finished, they would be visible in that exported
snapshot and you couldn't use it to to roll forward from anymore.

It'd be pretty easy to optionally decode the transactions we currently
skip if we want that feature later. That would remove the option to
export a snapshot in many cases though (think suboverflowed snapshots).

> I have a feeling you're going to tell me it
> doesn't work like that, but maybe it should, because there's a whole
> lot of benefit in having decoding start up quickly, and a whole lot of
> benefit also to having the rules for that be easy to understand.

I am not sure if the above qualifies as "doesn't work like that", if
not, sometimes the correct thing isn't the immediately obvious thing. I
think "all transactions that were running when initiating decoding need
to finish" is reasonably easy to explain.

> > I am also open to different behaviour for the SRF, but I am not sure
> > what that could be. There's just no sensible way to stream data on the
> > SQL level afaics.

> I don't have a problem with the behavior.  Seems useful.  One useful
> addition might be to provide an option to stream out up to X changes
> but without consuming them, so that the DBA can peek at the
> replication stream.  I think it's a safe bet DBAs will want to do
> things like that, so it'd be nice to make it easy, if we can.

It's not too difficult to provide an option to do that. What I've been
thinking of was to correlate the confirmation of consumption with the
transaction the SRF is running in. So, confirm the data as consumed if
it commits, and don't if not. I think we could do that relatively easily
by registering a XACT_EVENT_COMMIT.

> > What about pg_decoding_slot_get_[binary_]changes()?
>
> Sounds about right, but I think we need to get religion about figuring
> out what terminology to use.  At the moment it seems to vary quite a
> bit between "logical", "logical decoding", and "decoding".  Not sure
> how to nail that down.

Agreed. Perhaps we should just avoid both logical and decoding entirely
and go for "changestream" or similar?

> As a more abstract linguistic question, what do we think the
> difference is between logical *replication* and logical *decoding*?
> Are they the same or different?  If different, how?

For me "logical decoding" can be the basis of "logical replication", but
also for other features.

> > I wonder if we should let the output plugin tell us whether it will
> > output data in binary? I think it generally would be a good idea to let
> > the output plugin's _init() function return some configuration
> > data. That will make extending the interface to support more features
> > easier.
> 
> Maybe, but you've got to consider the question of encoding, too.  You
> could make the choices "binary" and "the database encoding", I
> suppose.

Yes, I think that should be the choice. There seems little justification
for an output plugin to produce textual output in anything but the
server encoding.

I am not sure if we want to verify that in !USE_ASSERT? That'd be quite
expensive...

> > As far as I have been thinking of, this would be another catalog table like
> > pg_decoding_plugin(oid, dpname name, dpload regproc).
> 
> Instead of adding another catalog table, I think we should just define
> a new type.  Again, please look at the way that foreign data wrappers
> do this:

I don't really see what the usage of a special type has to do with this,
but I think that's besides your main point. What you're saying is that
the output plugin is just defined by a function name, possibly schema
prefixed. That has an elegance to it. +1

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction Interfaces

Reply via email to