Re: [HACKERS] Changeset Extraction Interfaces

Robert Haas Thu, 12 Dec 2013 07:02:19 -0800

On Thu, Dec 12, 2013 at 7:04 AM, Andres Freund <and...@2ndquadrant.com> wrote:
> I think there'll always be a bit of a difference between slots for
> physical and logical data, even if 90% of the implementation is the
> same. We can signal that difference by specifying logical/physical as an
> option or having two different sets of commands.
>
> Maybe?
>
> ACQUIRE_REPLICATION_SLOT slot_name PHYSICAL physical_opts
> ACQUIRE_REPLICATION_SLOT slot_name LOGICAL logical_opts
> -- already exists without slot, PHYSICAL arguments
> START_REPLICATION [SLOT slot] [PHYSICAL] RECPTR opt_timeline
> START_REPLICATION SLOT LOGICAL slot plugin_options
> RELEASE_REPLICATION_SLOT slot_name


I assume you meant START_REPLICATION SLOT slot LOGICAL plugin_options,
but basically this seems OK to me.  I hadn't realized that the options
were going to be different for logical vs. physical.  So you could
also do ACQUIRE_LOGICAL_SLOT, ACQUIRE_PHYSICAL_SLOT,
START_REPLICATION, START_LOGICAL_REPLICATION, and RELEASE_SLOT.  I'm
not sure whether that's better.

>> It also strikes me that just as it's possible to stream WAL without
>> allocating a slot first (since we don't at present have slots),
>> perhaps it ought also to be possible to stream logical replication
>> data without acquiring a slot first.  You could argue that it was a
>> mistake not to introduce slots in the first place, but the stateless
>> nature of WAL streaming definitely has some benefits, and it's unclear
>> to me why you shouldn't be able to do the same thing with logical
>> decoding.
>
> I think it would be quite a bit harder for logical decoding. The
> difference is that, from the perspective of the walsender, for plain WAL
> streaming, all that needs to be checked is whether the WAL is still
> there. For decoding though, we need to be sure that a) the catalog xmin
> is still low enough and has been all along b) that we are able instantly
> build a historical mvcc snapshot from the point we want to start
> streaming.
> Both a) and b) are solved by keeping the xmin and the point where to
> reread WAL from in the slot data and by serializing data about
> historical snapshots to disk. But those are removed if there isn't a
> slot around requiring them...
>
> So what you could get is something that starts streaming you changes
> sometime after you asked it to start streaming, without a guarantee that
> you can restart at exactly the position you stopped. If that's useful,
> we can do it, but I am not sure what the usecase would be?

I haven't yet looked closely at the snapshot-building stuff, but my
thought is that you ought to be able to decode any transactions that
start after you make the connection.  You might not be able to decode
transactions that are already in progress at that point, because you
might have already missed XID assignment records, catalog changes,
etc. that they've performed.  But transactions that begin after that
point ought to be OK.  I have a feeling you're going to tell me it
doesn't work like that, but maybe it should, because there's a whole
lot of benefit in having decoding start up quickly, and a whole lot of
benefit also to having the rules for that be easy to understand.

Now if you have that, then I think ad-hoc decoding is potentially
useful.  Granted, you're not going to want to build a full-fledged
replication solution that way, but you might want to just connect and
watch the world stream by... or you might imagine an application that
opens a replication connection and a regular connection, copies a
table, and then applies the stream of changes made to that table after
the fact.  When that completes, the table is sync'd between the two
machines as of the end of the copy.  Useful enough to bother with?  I
don't know.  But not obviously useless.

> I am also open to different behaviour for the SRF, but I am not sure
> what that could be. There's just no sensible way to stream data on the
> SQL level afaics.

I don't have a problem with the behavior.  Seems useful.  One useful
addition might be to provide an option to stream out up to X changes
but without consuming them, so that the DBA can peek at the
replication stream.  I think it's a safe bet DBAs will want to do
things like that, so it'd be nice to make it easy, if we can.

> What about pg_decoding_slot_get_[binary_]changes()?

Sounds about right, but I think we need to get religion about figuring
out what terminology to use.  At the moment it seems to vary quite a
bit between "logical", "logical decoding", and "decoding".  Not sure
how to nail that down.

As a more abstract linguistic question, what do we think the
difference is between logical *replication* and logical *decoding*?
Are they the same or different?  If different, how?

> I wonder if we should let the output plugin tell us whether it will
> output data in binary? I think it generally would be a good idea to let
> the output plugin's _init() function return some configuration
> data. That will make extending the interface to support more features
> easier.

Maybe, but you've got to consider the question of encoding, too.  You
could make the choices "binary" and "the database encoding", I
suppose.

>> Now you provide a function RegisterOutputPlugin(output_plugin *).  If
>> there are any output plugins built into core, core will call
>> RegisterOutputPlugin once for each one.  If a shared library
>> containing an output plugin is loaded, the libraries _PG_init function
>> does the same thing.  When someone tries to use a plugin, they ask for
>> it by name.  We go iterate through the data saved by all previous
>> calls to RegisterOutputPlugin() until we find one with a matching
>> name, and then we use the callbacks embedded in that struct.
>
> But if we don't pass in a .so's name, how can additional plugins be
> registered except by adding them to [shared|local]_preload_libraries? If
> we do pass in one, it seems confusing if you suddenly get a plugin
> implemented somewhere else.

I don't see what the confusion is.  You can use any method you like
for loading those libraries, including shared_preload_libarries,
local_preload_libraries, or the LOAD command.  The replication grammar
might have to grow support for LOAD.  But I think the winner may be
the next option.  Some backends might not actually use the library in
question, but that's true of any library you preload.

>> > IV) Make output plugins a SQL-level object/catalog table where a plugin
>> > can be registered, and the callbacks are normal pg_proc entries. It's
>> > more in line with other stuff, but has the disadvantage that we need to
>> > register plugins on the primary, even if we only stream from a
>> > standby. But then, we're used to that with CREATE EXTENSION et al.
>>
>> I don't think I'd make every callback a pg_proc entry; I'd make a
>> single pg_proc entry that returns a struct of function pointers, as we
>> do for FDWs.  But I think this has merit.  One significant advantage
>> of it over (III) is that execution of a function in pg_proc can
>> trigger a library load without any extra pushups, which is nice.
>
> So I guess this is? It has the advantage that an output plugin can
> create any additional functionality it needs in the course of it's
> CREATE EXTENSION.

Yes, that's elegant.

> As far as I have been thinking of, this would be another catalog table like
> pg_decoding_plugin(oid, dpname name, dpload regproc).

Instead of adding another catalog table, I think we should just define
a new type.  Again, please look at the way that foreign data wrappers
do this:

rhaas=# \df file_fdw_handler
                              List of functions
 Schema |       Name       | Result data type | Argument data types |  Type
--------+------------------+------------------+---------------------+--------
 public | file_fdw_handler | fdw_handler      |                     | normal
(1 row)

Is there any reason not to slavishly copy that design?  Note that if
you do it this way, you don't need any special DDL or pg_dump support;
a separate catalog table will raise the bar considerably.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction Interfaces

Reply via email to