Re: Initial Schema Sync for Logical Replication

Amit Kapila Mon, 22 May 2023 22:32:08 -0700

On Mon, May 22, 2023 at 6:37 AM Masahiko Sawada <[email protected]> wrote:
>
> On Fri, Apr 28, 2023 at 4:16 PM Masahiko Sawada <[email protected]> wrote:
> > Yes, in this approach, we need to dump/restore objects while
> > specifying with fine granularity. Ideally, the table sync worker dumps
> > and restores the table schema, does copy the initial data, and then
> > creates indexes, and triggers and table-related objects are created
> > after that. So if we go with the pg_dump approach to copy the schema
> > of individual tables, we need to change pg_dump (or libpgdump needs to
> > be able to do) to support it.
>
> We have been discussing how to sync schema but I'd like to step back a
> bit and discuss use cases and requirements of this feature.
>
> Suppose that a table belongs to a publication, what objects related to
> the table we want to sync by the initial schema sync features? IOW, do
> we want to sync table's ACLs, tablespace settings, triggers, and
> security labels too?
>
> If we want to replicate the whole database, e.g. when using logical
> replication for major version upgrade, it would be convenient if it
> synchronizes all table-related objects. However, if we have only this
> option, it could be useless in some cases. For example, in a case
> where users have different database users on the subscriber than the
> publisher, they might want to sync only CREATE TABLE, and set ACL etc
> by themselves. In this case, it would not be necessary to sync ACL and
> security labels.
>
> What use case do we want to support by this feature? I think the
> implementation could be varied depending on how to select what objects
> to sync.
>
> One possible idea is to select objects to sync depending on how DDL
> replication is set in the publisher. It's straightforward but I'm not
> sure the design of DDL replication syntax has been decided. Also, even
> if we create a publication with ddl = 'table' option, it's not clear
> to me that we want to sync table-dependent triggers, indexes, and
> rules too by the initial sync feature.
>


I think it is better to keep the initial sync the same as the
replication. So, if the publication specifies 'table' then we should
just synchronize tables. Otherwise, it will look odd that the initial
sync has synchronized say index-related DDLs but then later
replication didn't replicate it. OTOH, if we want to do initial sync
of table-dependent objects like triggers, indexes, rules, etc. when
the user has specified ddl = 'table' then the replication should also
follow the same. The main reason to exclude the other objects during
replication is to reduce the scope of deparsing patch but if we have a
finite set of objects (say all dependent on the table) then we can
probably try to address those.

> Second idea is to make it configurable by users so that they can
> specify what objects to sync. But it would make the feature complex
> and I'm not sure users can use it properly.
>
> Third idea is that since the use case of synchronizing the whole
> database can be achievable even by pg_dump(all), we support
> synchronizing only tables (+ indexes) in the initial sync feature,
> which can not be achievable by pg_dump.
>

Can't we add some switch to dump only the table and not its dependents
if we want to go with that approach?

-- 
With Regards,
Amit Kapila.

Re: Initial Schema Sync for Logical Replication

Reply via email to