RE: Support logical replication of DDLs

Phil Florent Mon, 03 Apr 2023 06:25:52 -0700

Hi,
Sorry about the list. Since it was a question about the specifications I 
thought I had to ask it first in the general list. I will reply in the hackers 
list only for new features.

Replicating from orcl to postgres was difficult. You mentionned renaming of 
columns, the ordinal position of a column is reused with a drop/add column in 
orcl and you can wrongly think it is a renaming from an external point of view. 
Only "advantage" with orcl is that you can drop/add columns thousands of times 
if you want, not with postgres.
 From PostgreSQL to PostgreSQL it's now easier of course but difficulty is that 
we have to separate DDL things. The "+" things have to be executed first on the 
replicated db (new tables, new columns, enlargement of columns). The "-" things 
have to be executed first on the source db (dropped tables, dropped columns, 
downsize of columns). DSS and OLTP teams are different, OLTP teams cannot or 
don't want to deal with DSS concerns etc.  If replication is delayed it's not 
so trivial anway to know when you can drop a table on the replicated db for 
example. DSS team has in fact to build a system that detects a posteriori why 
the subscription is KO if something goes wrong. It can also be a human mistake, 
e.g a "create table very_important_table_to_save as select * from 
very_important_table;" and the replication is KO if the _save table is created 
in the published schema.
I had read too fast. I read the proposals and Vignesh suggestion & syntax seem 
very promising. If I understand well an existing "for all tables" / "tables in 
schema" DML publication would have be to altered with
ALTER PUBLICATION 
simple_applicative_schema_replication_that_wont_be_interrupted_for_an_rdbms_reason
 WITH (ddl = 'table:create,alter'); to get rid of the majority of possible 
interruptions.

> Additionally, there could be some additional problems to deal with
> like say if the column list has been specified then we ideally
> shouldn't send those columns even in DDL. For example, if one uses
> Alter Table .. Rename Column and the new column name is not present in
> the published column list then we shouldn't send it.
Perhaps I miss something but the names are not relevant here. The column is 
part of the publication and the corresponding DDL has to be sent, the column is 
not part of the publication and the DDL should not be sent. Dependencies are 
not based on names, it currently works like that with DML publication but also 
with views for example.
Quick test :

bas=# \dRp+

                                         Publication test_phil

Propriétaire | Toutes les tables | Insertions | Mises à jour | Suppressions | 
Tronque | Via la racine

--------------+-------------------+------------+--------------+--------------+---------+---------------

postgres     | f                 | t          | t            | t            | t 
      | f

Tables :

    "test_phil.t1" (c1, c2)

bas=# alter table test_phil.t1 rename column c2 to c4;

ALTER TABLE

bas=# \dRp+

                                         Publication test_phil

Propriétaire | Toutes les tables | Insertions | Mises à jour | Suppressions | 
Tronque | Via la racine

--------------+-------------------+------------+--------------+--------------+---------+---------------

postgres     | f                 | t          | t            | t            | t 
      | f

Tables :

    "test_phil.t1" (c1, c4)

"rename column" DDL has to be sent and the new name is not relevant in the 
decision to send it. If "rename column" DDL had concerned a column that is not 
part of the publication you wouldn't have to send the DDL, no matter the new 
name. Drop is not a problem. You cannot drop an existing column that is part of 
a publication without a "cascade". What could be problematic is a "add column" 
DDL and after that the column is added to the publication via "alter 
publication set". Such a case is difficult to deal with I guess. But the 
initial DDL to create a table is also not sent anyway right ?  It could be a 
known limitation.

I usually only test things in beta to report but I will try to have a look 
earlier at this patch since it is very interesting. That and the TDE thing but 
TDE is an external obligation and not a real interest. I obtained a delay but 
hopefully we will have this encryption thing or perhaps we will be obliged to 
go back to the proprietary RDBMS for some needs even if the feature is in fact 
mostly useless...

Best regards,
Phil
________________________________
De : Amit Kapila <amit.kapil...@gmail.com>
Envoyé : lundi 3 avril 2023 06:07
À : Phil Florent <philflor...@hotmail.com>
Cc : vignesh C <vignes...@gmail.com>; houzj.f...@fujitsu.com 
<houzj.f...@fujitsu.com>; Ajin Cherian <itsa...@gmail.com>; 
wangw.f...@fujitsu.com <wangw.f...@fujitsu.com>; Runqi Tian 
<runqi...@gmail.com>; Peter Smith <smithpb2...@gmail.com>; Tom Lane 
<t...@sss.pgh.pa.us>; li jie <ggys...@gmail.com>; Dilip Kumar 
<dilipbal...@gmail.com>; Alvaro Herrera <alvhe...@alvh.no-ip.org>; Masahiko 
Sawada <sawada.m...@gmail.com>; Japin Li <japi...@hotmail.com>; rajesh 
singarapu <rajesh.rs0...@gmail.com>; Zheng Li <zhengl...@gmail.com>; PostgreSQL 
Hackers <pgsql-hackers@lists.postgresql.org>
Objet : Re: Support logical replication of DDLs

On Sun, Apr 2, 2023 at 3:25 PM Phil Florent <philflor...@hotmail.com> wrote:
>
> As an end-user, I am highly interested in the patch 
> https://commitfest.postgresql.org/42/3595/ but I don't fully get its main 
> goal in its first version.
> It's "for all tables"  that will be implemented ?
> If one needs a complete replication of a cluster, a hot standby will always 
> be more efficient than a publication right ? I use both for different needs 
> in public hospitals I work for (hot standby for disaster recovery & logical 
> replication for dss)
> The main interest of a publication is to be able to filter things on the 
> publisher and to add stuff on the replicated cluster.
> If you compare PostgreSQL with less avanced RDBMS that don't really implement 
> schemas (typically Oracle), the huge advantage of Postgre is that many things 
> (e.g security) can be dynamically implemented via schemas.
> Developers just have put a table in the "good" schema and that's all. Logical 
> DML replication now fully implements this logic since PostgreSQL 15. Only 
> remaining problem is that a "for tables in schema" publication has to be 
> owned by a superuser (because a normal user can have tables that don't belong 
> to him in a schema it owns ?) If DDL replication only works with "FOR ALL 
> TABLES " and not "FOR TABLES IN SCHEMA" it reduces its interest anyway.
>

I don't see any major issue with supporting it for both "FOR ALL
TABLES" and "FOR ALL TABLES IN SCHEMA". However, if we want to support
it with the "FOR TABLE .." variant then we will probably need to apply
some restrictions as we can only support 'alter' and 'drop'.
Additionally, there could be some additional problems to deal with
like say if the column list has been specified then we ideally
shouldn't send those columns even in DDL. For example, if one uses
Alter Table .. Rename Column and the new column name is not present in
the published column list then we shouldn't send it.

BTW, we have some proposals related to the specification of this
feature in emails [1][2][3]. See, if you have any suggestions on the
same?

Note- It seems you have copied this thread to pgsql-general. Is it
because you are not subscribed to pgsql-hackers? As this is a
development project so better to keep the discussion on pgsql-hackers.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2B%2BY7a2SQq55DXT6neghZgj3j%2BpQ74%3D8zfT3k8Tkdj0ZA%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1KZqvJsTt7OkS8AkxOKVvSpkQkPwsqzNmo10mFaVAKeZg%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/OS0PR01MB571646874A3E165D93999A9D94889%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

RE: Support logical replication of DDLs

Reply via email to