Re: Initial Schema Sync for Logical Replication

Masahiko Sawada Tue, 21 Mar 2023 19:59:46 -0700

On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira <eu...@eulerto.com> wrote:
> >
> > On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
> >
> > > From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
> > > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> > > On 2023-Mar-15, Kumar, Sachin wrote:
> > >
> > > > 1. In  CreateSubscription()  when we create replication
> > > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> > > can use this snapshot later in the pg_dump.
> > > >
> > > > 2.  Now we can call pg_dump with above snapshot from CreateSubscription.
> > >
> > > Overall I'm not on board with the idea that logical replication would 
> > > depend on
> > > pg_dump; that seems like it could run into all sorts of trouble (what if 
> > > calling
> > > external binaries requires additional security setup?  what about pg_hba
> > > connection requirements? what about max_connections in tight
> > > circumstances?).
> > > what if calling external binaries requires additional security setup
> > I am not sure what kind of security restriction would apply in this case, 
> > maybe pg_dump
> > binary can be changed ?
> >
> > Using pg_dump as part of this implementation is not acceptable because we
> > expect the backend to be decoupled from the client. Besides that, pg_dump
> > provides all table dependencies (such as tablespaces, privileges, security
> > labels, comments); not all dependencies shouldn't be replicated.
> >
>
> I agree that in the initial version we may not support sync of all
> objects but why that shouldn't be possible in the later versions?
>
> > You should
> > exclude them removing these objects from the TOC before running pg_restore 
> > or
> > adding a few pg_dump options to exclude these objects. Another issue is 
> > related
> > to different version. Let's say the publisher has a version ahead of the
> > subscriber version, a new table syntax can easily break your logical
> > replication setup. IMO pg_dump doesn't seem like a good solution for initial
> > synchronization.
> >
> > Instead, the backend should provide infrastructure to obtain the required 
> > DDL
> > commands for the specific (set of) tables. This can work around the issues 
> > from
> > the previous paragraph:
> >
> ...
> > * don't need to worry about different versions.
> >
>
> AFAICU some of the reasons why pg_dump is not allowed to dump from the
> newer version are as follows: (a) there could be more columns in the
> newer version of the system catalog and then Select * type of stuff
> won't work because the client won't have knowledge of additional
> columns. (b) the newer version could have new features (represented by
> say new columns in existing catalogs or new catalogs) that the older
> version of pg_dump has no knowledge of and will fail to get that data
> and hence an inconsistent dump. The subscriber will easily be not in
> sync due to that.
>
> Now, how do we avoid these problems even if we have our own version of
> functionality similar to pg_dump for selected objects? I guess we will
> face similar problems.


Right. I think that such functionality needs to return DDL commands
that can be executed on the requested version.

> If so, we may need to deny schema sync in any such case.

Yes. Do we have any concrete use case where the subscriber is an older
version, in the first place?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Initial Schema Sync for Logical Replication

Reply via email to