Hello!Thank you very much, the information you just sent to me is very, very valuable! Best regards, Cristi Boboc
On Wednesday, November 23, 2022 at 04:45:09 AM GMT+2, Peter Smith <smithpb2...@gmail.com> wrote: On Fri, Nov 18, 2022 at 4:50 AM PG Doc comments form <nore...@postgresql.org> wrote: > > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/15/logical-replication-row-filter.html > Description: Hi, FYI - I have forwarded this post to the hacker's list, where I think it will receive more attention. I am not sure why that (above) page was cited -- the section "31.3 Row Filters" is specifically about row filtering, whereas the items you reported seem unrelated to row filters, but are generic for all Logical Replication. > > There are several things missing here and some of them I found to be highly > important: > 1. How can I find why a logical replication failed. Currently I only can see > it "does nothing" in pg_stat_subscriptions. There should be logs reporting any replication conflicts etc. See [1] for example logs. See also the answer for #2 below. > 2. In case of copying the existing data: how can I find which tables or > partitions were processed and which are on the processing queue (while > monitoring I have observed no specific order or rule). There is no predictable processing queue or order - The initial tablesyncs might be happening in multiple asynchronous processes according to the GUC max_sync_workers_per_subscription [2]. Below I show examples of replicating two tables (tab1 and tab2). ~~ >From the logs you should see which table syncs have completed OK: e.g. (the initial copy is all good) test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=test_pub' PUBLICATION pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2022-11-23 12:23:18.501 AEDT [27961] LOG: logical replication apply worker for subscription "sub1" has started 2022-11-23 12:23:18.513 AEDT [27963] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has started 2022-11-23 12:23:18.524 AEDT [27965] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:23:18.593 AEDT [27963] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has finished 2022-11-23 12:23:18.611 AEDT [27965] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has finished e.g. (where there is conflict in table tab2) test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=test_pub' PUBLICATION pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2022-11-23 12:40:56.794 AEDT [23401] LOG: logical replication apply worker for subscription "sub1" has started 2022-11-23 12:40:56.808 AEDT [23403] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has started 2022-11-23 12:40:56.819 AEDT [23405] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:40:56.890 AEDT [23405] ERROR: duplicate key value violates unique constraint "tab2_pkey" 2022-11-23 12:40:56.890 AEDT [23405] DETAIL: Key (id)=(1) already exists. 2022-11-23 12:40:56.890 AEDT [23405] CONTEXT: COPY tab2, line 1 2022-11-23 12:40:56.891 AEDT [3233] LOG: background worker "logical replication worker" (PID 23405) exited with exit code 1 2022-11-23 12:40:56.902 AEDT [23403] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has finished ... ~~ Alternatively, you can use some SQL query to discover which tables of the subscription had attained a READY state. The READY state (denoted by 'r') means that the initial COPY was completed ok. The table replication state is found in the 'srsubstate' column. See [3] e.g. (the initial copy is all good) test_sub=# select sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s where ut.relid = sr.srrelid and s.oid=sr.srsubid; srsubid | srrelid | subname | relname | srsubstate ---------+---------+---------+---------+------------ 16418 | 16409 | sub1 | tab1 | r 16418 | 16402 | sub1 | tab2 | r (2 rows) e.g. (where it has a conflict in table tab2, so it did not get to READY state) test_sub=# select sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s where ut.relid = sr.srrelid and s.oid=sr.srsubid;2022-11-23 12:41:37.686 AEDT [24501] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:41:37.774 AEDT [24501] ERROR: duplicate key value violates unique constraint "tab2_pkey" 2022-11-23 12:41:37.774 AEDT [24501] DETAIL: Key (id)=(1) already exists. 2022-11-23 12:41:37.774 AEDT [24501] CONTEXT: COPY tab2, line 1 2022-11-23 12:41:37.775 AEDT [3233] LOG: background worker "logical replication worker" (PID 24501) exited with exit code 1 srsubid | srrelid | subname | relname | srsubstate ---------+---------+---------+---------+------------ 16423 | 16409 | sub1 | tab1 | r 16423 | 16402 | sub1 | tab2 | d > 3. In case of copying the existing data there is no option to update the row > based on the Primary Key if it already exists at the destination. The COPY > only fails. Yes, the conflicts section [1] describes this -- "A conflict will produce an error and will stop the replication; it must be resolved manually by the user." > 4. Is it possible to restart an interrupted logical replication. If yes, > then how? Taking in account the already existing data!!! You can use the SUBSCRIPTION copy_data=false parameter to avoid re-copying initial data. But this applies to all tables of the subscription so if you have a situation where there are some tables copied and some not copied then you might have to either truncate the tables and start again, or you might want to create additional temporary subscriptions with appropriate copy_data=true/false parameter. I guess the best course of action depends if you had conflicts with 1 or 2 tables or 10000 tables. > > IMHO there are some big functionality features still missing, but they can > be added. > I am not sure if there is missing functionality, but perhaps there is some information that is harder to find than it ought to be, so I would like to help first address that part. ------ [1] conflicts. https://www.postgresql.org/docs/current/logical-replication-conflicts.html [2] max_sync_workers_per_subscription. https://www.postgresql.org/docs/current/runtime-config-replication.html [3] srsubstate. https://www.postgresql.org/docs/current/catalog-pg-subscription-rel.html Kind Regards, Peter Smith. Fujitsu Australia