Re: [HACKERS] Changeset Extraction Interfaces

Andres Freund Thu, 12 Dec 2013 04:06:02 -0800

Hello Robert,

On 2013-12-11 22:29:46 -0500, Robert Haas wrote:
> On Wed, Dec 4, 2013 at 7:15 PM, Andres Freund <[email protected]> wrote:
> > There's basically three major 'verbs' that can be performed on a
> > stream, currently named (walsender names):
> > * INIT_LOGICAL_REPLICATION "name" "output_plugin"
> > * START_LOGICAL_REPLICATION "name" last_received ("option_name" value,...)
> > * FREE_LOGICAL_REPLICATION "name"
> >
> > The SQL variant currrently has:
> > * init_logical_replication(name, plugin)
> > * start_logical_replication(name, stream_upto, options[])
> > * stop_logical_replication(name)
> >
> > You might have noticed the slight inconsistency...
> 
> I think this naming is probably not the greatest.


Completely agreed.

> When I hear "init",
> I don't think "permanently allocate a resource that will never be
> released unless I explicitly throw it away", and when I hear "stop", I
> don't think "free that resource".  I suggest naming based around
> create/drop, register/unregister, or acquire/release.  Since, as
> previously noted, I'm gunning for these slots to apply to ordinary
> replication as well, I kind of like ACQUIRE_REPLICATION_SLOT and
> RELEASE_REPLICATION_SLOT.  If we're going to make them specific to
> logical replication, then your suggestion of CREATE_DECODING_SLOT and
> DROP_DECODING_SLOT, or maybe ACQUIRE_DECODING_SLOT and
> RELEASE_DECODING_SLOT, sounds fine.

I think there'll always be a bit of a difference between slots for
physical and logical data, even if 90% of the implementation is the
same. We can signal that difference by specifying logical/physical as an
option or having two different sets of commands.

Maybe?

ACQUIRE_REPLICATION_SLOT slot_name PHYSICAL physical_opts
ACQUIRE_REPLICATION_SLOT slot_name LOGICAL logical_opts
-- already exists without slot, PHYSICAL arguments
START_REPLICATION [SLOT slot] [PHYSICAL] RECPTR opt_timeline
START_REPLICATION SLOT LOGICAL slot plugin_options
RELEASE_REPLICATION_SLOT slot_name

> It also strikes me that just as it's possible to stream WAL without
> allocating a slot first (since we don't at present have slots),
> perhaps it ought also to be possible to stream logical replication
> data without acquiring a slot first.  You could argue that it was a
> mistake not to introduce slots in the first place, but the stateless
> nature of WAL streaming definitely has some benefits, and it's unclear
> to me why you shouldn't be able to do the same thing with logical
> decoding.

I think it would be quite a bit harder for logical decoding. The
difference is that, from the perspective of the walsender, for plain WAL
streaming, all that needs to be checked is whether the WAL is still
there. For decoding though, we need to be sure that a) the catalog xmin
is still low enough and has been all along b) that we are able instantly
build a historical mvcc snapshot from the point we want to start
streaming.
Both a) and b) are solved by keeping the xmin and the point where to
reread WAL from in the slot data and by serializing data about
historical snapshots to disk. But those are removed if there isn't a
slot around requiring them...

So what you could get is something that starts streaming you changes
sometime after you asked it to start streaming, without a guarantee that
you can restart at exactly the position you stopped. If that's useful,
we can do it, but I am not sure what the usecase would be?

> > b) Decide which of the SQL functions should be in a contrib module, and
> >    which in core. Currently init_logical_replication() and
> >    stop_logical_replication() are in core, whereas
> >    start_logical_replication() is in the 'test_logical_decoding'
> >    extension. The reasoning behind that is that init/stop ones are
> >    important to the DBA and the start_logical_replication() SRF isn't
> >    all that useful in the real world because our SRFs don't support
> >    streaming changes out.
> 
> Seems pretty arbitrary to me.  If start_logical_replication() is
> usable with any output plugin, then it ought to be in core.

Ok, I certainly have no problem with that.

> I think
> the name isn't great, though; the actual functionality seems to be
> more or less decode-from-last-position-up-to-present, which doesn't
> sound much like "start".

The name originally comes from the START_REPLICATION walsender command,
but I agree, there's not much point in trying to keep that name.

I am also open to different behaviour for the SRF, but I am not sure
what that could be. There's just no sensible way to stream data on the
SQL level afaics.

What about pg_decoding_slot_get_[binary_]changes()?

> > c) Which data-types does start_logical_replication() return. Currently
> > it's OUT location text, OUT xid bigint, OUT data text. Making the 'data'
> > column text has some obvious disadvantages though - there's obvious
> > usecases for output plugins that return binary data. But making it bytea
> > sucks, because the output is harder to read by default...

> I think having two functions might be sensible.  I'm not sure what
> happens if the text function is used and the plugin outputs something
> that's not valid in the database encoding, though.  I guess you'd
> better check for that and error out.

I wonder if we should let the output plugin tell us whether it will
output data in binary? I think it generally would be a good idea to let
the output plugin's _init() function return some configuration
data. That will make extending the interface to support more features
easier.

> > III) Keep the output plugin a library, but simply rely on _PG_init()
> > calling a function to register all callbacks. Imo it's worse than I) and
> > II) because it basically prohibits using the library in
> > shared_preload_libraries as well, because then it's _PG_init() doesn't
> > get called when starting to stream, and another library might have
> > registered other callbacks.

> I don't understand the disadvantage that you're describing here.  What
> I'm imagining is that you have some struct that looks like this:

> Now you provide a function RegisterOutputPlugin(output_plugin *).  If
> there are any output plugins built into core, core will call
> RegisterOutputPlugin once for each one.  If a shared library
> containing an output plugin is loaded, the libraries _PG_init function
> does the same thing.  When someone tries to use a plugin, they ask for
> it by name.  We go iterate through the data saved by all previous
> calls to RegisterOutputPlugin() until we find one with a matching
> name, and then we use the callbacks embedded in that struct.

But if we don't pass in a .so's name, how can additional plugins be
registered except by adding them to [shared|local]_preload_libraries? If
we do pass in one, it seems confusing if you suddenly get a plugin
implemented somewhere else.

> > IV) Make output plugins a SQL-level object/catalog table where a plugin
> > can be registered, and the callbacks are normal pg_proc entries. It's
> > more in line with other stuff, but has the disadvantage that we need to
> > register plugins on the primary, even if we only stream from a
> > standby. But then, we're used to that with CREATE EXTENSION et al.
> 
> I don't think I'd make every callback a pg_proc entry; I'd make a
> single pg_proc entry that returns a struct of function pointers, as we
> do for FDWs.  But I think this has merit.  One significant advantage
> of it over (III) is that execution of a function in pg_proc can
> trigger a library load without any extra pushups, which is nice.

So I guess this is? It has the advantage that an output plugin can
create any additional functionality it needs in the course of it's
CREATE EXTENSION.

As far as I have been thinking of, this would be another catalog table like
pg_decoding_plugin(oid, dpname name, dpload regproc).

I guess that we at least need a function for adding output plugins
there, that also creates proper pg_depend entries? We could also go for
full DDL commands, although I have a bit of a hard time finding
suitable, already used, keywords... If we're not worried about that, maybe
CREATE/DROP OUTPUT PLUGIN ...;

Thanks for the input!

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction Interfaces

Reply via email to