Re: Add contrib/pg_logicalsnapinspect
My review comments for v8-0001 == contrib/pg_logicalinspect/pg_logicalinspect.c 1. +/* + * Lookup table for SnapBuildState. + */ + +#define SNAPBUILD_STATE_INCR 1 + +static const char *const SnapBuildStateDesc[] = { + [SNAPBUILD_START + SNAPBUILD_STATE_INCR] = "start", + [SNAPBUILD_BUILDING_SNAPSHOT + SNAPBUILD_STATE_INCR] = "building", + [SNAPBUILD_FULL_SNAPSHOT + SNAPBUILD_STATE_INCR] = "full", + [SNAPBUILD_CONSISTENT + SNAPBUILD_STATE_INCR] = "consistent", +}; + +/* nit - the SNAPBUILD_STATE_INCR made this code appear more complicated than it is. Please take a look at the attachment for an alternative implementation which includes an explanatory comment. YMMV. Feel free to ignore it. == src/include/replication/snapbuild.h 2. + * Please keep SnapBuildStateDesc[] (located in the pg_logicalinspect module) + * updated should a change needs to be done in SnapBuildState. nit - "...should a change needs to be done" -- the word "needs" is incorrect here. How about: "...if a change needs to be made to SnapBuildState." "...if a change is made to SnapBuildState." "...if SnapBuildState is changed." == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/contrib/pg_logicalinspect/pg_logicalinspect.c b/contrib/pg_logicalinspect/pg_logicalinspect.c index 100b82f..2419df1 100644 --- a/contrib/pg_logicalinspect/pg_logicalinspect.c +++ b/contrib/pg_logicalinspect/pg_logicalinspect.c @@ -24,19 +24,6 @@ PG_FUNCTION_INFO_V1(pg_get_logical_snapshot_meta); PG_FUNCTION_INFO_V1(pg_get_logical_snapshot_info); /* - * Lookup table for SnapBuildState. - */ - -#define SNAPBUILD_STATE_INCR 1 - -static const char *const SnapBuildStateDesc[] = { - [SNAPBUILD_START + SNAPBUILD_STATE_INCR] = "start", - [SNAPBUILD_BUILDING_SNAPSHOT + SNAPBUILD_STATE_INCR] = "building", - [SNAPBUILD_FULL_SNAPSHOT + SNAPBUILD_STATE_INCR] = "full", - [SNAPBUILD_CONSISTENT + SNAPBUILD_STATE_INCR] = "consistent", -}; - -/* * NOTE: For any code change or issue fix here, it is highly recommended to * give a thought about doing the same in SnapBuildRestore() as well. */ @@ -186,6 +173,16 @@ Datum pg_get_logical_snapshot_info(PG_FUNCTION_ARGS) { #define PG_GET_LOGICAL_SNAPSHOT_INFO_COLS 14 + /* +* Lookup table for SnapBuildState. The lookup index is offset by 1 +* because the consecutive SnapBuildState enum values start at -1. +*/ + static const char *const SnapBuildStateDesc[] = { + [1 + SNAPBUILD_START] = "start", + [1 + SNAPBUILD_BUILDING_SNAPSHOT] = "building", + [1 + SNAPBUILD_FULL_SNAPSHOT] = "full", + [1 + SNAPBUILD_CONSISTENT] = "consistent", + }; SnapBuildOnDisk ondisk; XLogRecPtr lsn; HeapTuple tuple; @@ -209,8 +206,7 @@ pg_get_logical_snapshot_info(PG_FUNCTION_ARGS) memset(nulls, 0, sizeof(nulls)); - values[i++] = CStringGetTextDatum(SnapBuildStateDesc[ondisk.builder.state + - SNAPBUILD_STATE_INCR]); + values[i++] = CStringGetTextDatum(SnapBuildStateDesc[ondisk.builder.state + 1]); values[i++] = TransactionIdGetDatum(ondisk.builder.xmin); values[i++] = TransactionIdGetDatum(ondisk.builder.xmax); values[i++] = LSNGetDatum(ondisk.builder.start_decoding_at); diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h index 78df2d1..e844a89 100644 --- a/src/include/replication/snapbuild.h +++ b/src/include/replication/snapbuild.h @@ -17,7 +17,7 @@ /* * Please keep SnapBuildStateDesc[] (located in the pg_logicalinspect module) - * updated should a change needs to be done in SnapBuildState. + * updated if a change needs to be made to SnapBuildState. */ typedef enum {
Re: Pgoutput not capturing the generated columns
On Fri, Sep 20, 2024 at 2:26 PM Amit Kapila wrote: > > On Fri, Sep 20, 2024 at 4:16 AM Peter Smith wrote: > > > > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada > > wrote: > > > > > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila > > > wrote: > > > > > > > > > > > > Users can use a publication like "create publication pub1 for table > > > > t1(c1, c2), t2;" where they want t1's generated column to be published > > > > but not for t2. They can specify the generated column name in the > > > > column list of t1 in that case even though the rest of the tables > > > > won't publish generated columns. > > > > > > Agreed. > > > > > > I think that users can use the publish_generated_column option when > > > they want to publish all generated columns, instead of specifying all > > > the columns in the column list. It's another advantage of this option > > > that it will also include the future generated columns. > > > > > > > OK. Let me give some examples below to help understand this idea. > > > > Please correct me if these are incorrect. > > > > Examples, when publish_generated_columns=true: > > > > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH > > (publish_generated_columns=true) > > t1 -> publishes a, b, gen2 (e.g. what column list says) > > t2 -> publishes c, d + ALSO gen1, gen2 > > > > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH > > (publish_generated_columns=true) > > t1 -> publishes a, b + ALSO gen1, gen2 > > t2 -> publishes gen1 (e.g. what column list says) > > > > These two could be controversial because one could expect that if > "publish_generated_columns=true" then publish generated columns > irrespective of whether they are mentioned in column_list. I am of the > opinion that column_list should take priority the results should be as > mentioned by you but let us see if anyone thinks otherwise. > > > > > == > > > > The idea LGTM, although now the parameter name > > ('publish_generated_columns') seems a bit misleading since sometimes > > generated columns get published "irrespective of the option". > > > > So, I think the original parameter name 'include_generated_columns' > > might be better here because IMO "include" seems more like "add them > > if they are not already specified", which is exactly what this idea is > > doing. > > > > I still prefer 'publish_generated_columns' because it matches with > other publication option names. One can also deduce from > 'include_generated_columns' that add all the generated columns even > when some of them are specified in column_list. > Fair point. Anyway, to avoid surprises it will be important for the precedence rules to be documented clearly (probably with some examples), == Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada wrote: > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila wrote: > > ... > > I think that the column list should take priority and we should > > publish the generated column if it is mentioned in irrespective of > > the option. > > Agreed. > > > ... > > > > Users can use a publication like "create publication pub1 for table > > t1(c1, c2), t2;" where they want t1's generated column to be published > > but not for t2. They can specify the generated column name in the > > column list of t1 in that case even though the rest of the tables > > won't publish generated columns. > > Agreed. > > I think that users can use the publish_generated_column option when > they want to publish all generated columns, instead of specifying all > the columns in the column list. It's another advantage of this option > that it will also include the future generated columns. > OK. Let me give some examples below to help understand this idea. Please correct me if these are incorrect. == Assuming these tables: t1(a,b,gen1,gen2) t2(c,d,gen1,gen2) Examples, when publish_generated_columns=false: CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH (publish_generated_columns=false) t1 -> publishes a, b, gen2 (e.g. what column list says) t2 -> publishes c, d CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=false) t1 -> publishes a, b t2 -> publishes gen1 (e.g. what column list says) CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=false) t1 -> publishes a, b t2 -> publishes c, d CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false) t1 -> publishes a, b t2 -> publishes c, d ~~ Examples, when publish_generated_columns=true: CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH (publish_generated_columns=true) t1 -> publishes a, b, gen2 (e.g. what column list says) t2 -> publishes c, d + ALSO gen1, gen2 CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true) t1 -> publishes a, b + ALSO gen1, gen2 t2 -> publishes gen1 (e.g. what column list says) CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=true) t1 -> publishes a, b + ALSO gen1, gen2 t2 -> publishes c, d + ALSO gen1, gen2 CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true) t1 -> publishes a, b + ALSO gen1, gen2 t2 -> publishes c, d + ALSO gen1, gen2 == The idea LGTM, although now the parameter name ('publish_generated_columns') seems a bit misleading since sometimes generated columns get published "irrespective of the option". So, I think the original parameter name 'include_generated_columns' might be better here because IMO "include" seems more like "add them if they are not already specified", which is exactly what this idea is doing. Thoughts? == Kind Regards, Peter Smith. Fujitsu Australia
Re: Add contrib/pg_logicalsnapinspect
Thanks for the updated patch. Here are a few more trivial comments for the patch v7-0001. == 1. Should the extension descriptions all be identical? I noticed small variations: contrib/pg_logicalinspect/Makefile +PGFILEDESC = "pg_logicalinspect - functions to inspect logical decoding components" contrib/pg_logicalinspect/meson.build +'--FILEDESC', 'pg_logicalinspect - functions to inspect contents of logical snapshots',]) contrib/pg_logicalinspect/pg_logicalinspect.control +comment = 'functions to inspect logical decoding components' == .../expected/logical_inspect.out 2 +step s1_get_logical_snapshot_info: SELECT (pg_get_logical_snapshot_info(f.name::pg_lsn)).state,(pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_xip,1),(pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_xip,1) FROM (SELECT replace(replace(name,'.snap',''),'-','/') AS name FROM pg_ls_logicalsnapdir()) AS f ORDER BY 2; +state|catchange_count|array_length|committed_count|array_length +-+---++---+ +2| 0|| 2| 2 +2| 2| 2| 0| +(2 rows) + 2a. Would it be better to rearrange those columns so 'committed' stuff comes before 'catchange' stuff, to make this table order consistent with the structure/code? ~ 2b. Maybe those 2 'array_length' columns could have aliases to uniquely identify them? e.g. 'catchange_array_length' and 'committed_array_length'. == contrib/pg_logicalinspect/pg_logicalinspect.c 3. +/* + * Validate the logical snapshot file. + */ +static void +ValidateAndRestoreSnapshotFile(XLogRecPtr lsn, SnapBuildOnDisk *ondisk, +const char *path) Since the name was updated then should the function comment also be updated to include something like the SnapBuildRestoreContents function comment? e.g. "Validate the logical snapshot file, and read the contents of the serialized snapshot to 'ondisk'." ~~~ pg_get_logical_snapshot_info: 4. nit - Add/remove some blank lines to help visually associate the array counts with their arrays. == .../specs/logical_inspect.spec 5. +setup +{ +DROP TABLE IF EXISTS tbl1; +CREATE TABLE tbl1 (val1 integer, val2 integer); + CREATE EXTENSION pg_logicalinspect; +} + +teardown +{ +DROP TABLE tbl1; +SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot'); + DROP EXTENSION pg_logicalinspect; +} Different indentation for the CREATE/DROP EXTENSION? == The attached file shows the whitespace nit (#4) == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/contrib/pg_logicalinspect/pg_logicalinspect.c b/contrib/pg_logicalinspect/pg_logicalinspect.c index 185f36a..308c653 100644 --- a/contrib/pg_logicalinspect/pg_logicalinspect.c +++ b/contrib/pg_logicalinspect/pg_logicalinspect.c @@ -205,8 +205,8 @@ pg_get_logical_snapshot_info(PG_FUNCTION_ARGS) values[i++] = BoolGetDatum(ondisk.builder.in_slot_creation); values[i++] = LSNGetDatum(ondisk.builder.last_serialized_snapshot); values[i++] = TransactionIdGetDatum(ondisk.builder.next_phase_at); - values[i++] = Int64GetDatum(ondisk.builder.committed.xcnt); + values[i++] = Int64GetDatum(ondisk.builder.committed.xcnt); if (ondisk.builder.committed.xcnt > 0) { Datum *arrayelems; @@ -223,7 +223,6 @@ pg_get_logical_snapshot_info(PG_FUNCTION_ARGS) nulls[i++] = true; values[i++] = Int64GetDatum(ondisk.builder.catchange.xcnt); - if (ondisk.builder.catchange.xcnt > 0) { Datum *arrayelems;
Re: Add contrib/pg_logicalsnapinspect
HI, here are some mostly minor review comments for the patch v5-0001. == Commit message 1. Do you think you should also name the new functions here? == contrib/pg_logicalinspect/pg_logicalinspect.c 2. Regarding the question about static function declarations: Shveta wrote: I was somehow under the impression that this is the way in the postgres i.e. not add redundant declarations. Will be good to know what others think on this. FWIW, my understanding is the convention is just to be consistent with whatever the module currently does. If it declares static functions, then declare them all (redundant or not). If it doesn't declare static functions, then don't add one. But, in the current case, since this is a new module, I guess it is entirely up to you whatever you want to do. ~~~ 3. +/* + * NOTE: For any code change or issue fix here, it is highly recommended to + * give a thought about doing the same in SnapBuildRestore() as well. + */ + nit - I think this NOTE should be part of this module's header comment. (e.g. like the tablesync.c NOTES) ~~~ ValidateSnapshotFile: 4. +ValidateSnapshotFile(XLogRecPtr lsn, SnapBuildOnDisk *ondisk, const char *path) +{ + int fd; + Size sz; nit - The 'sz' is overwritten a few times. I thnk declaring it at each scope where used would be tidier. ~~~ 5. + fsync_fname(path, false); + fsync_fname(PG_LOGICAL_SNAPSHOTS_DIR, true); + + nit - remove some excessive blank lines ~~~ 6. + /* read statically sized portion of snapshot */ + SnapBuildRestoreContents(fd, (char *) ondisk, SnapBuildOnDiskConstantSize, path); Should that say "fixed size portion"? ~~~ pg_get_logical_snapshot_info: 7. + if (ondisk.builder.committed.xcnt > 0) + { + Datum*arrayelems; + int narrayelems; + + arrayelems = (Datum *) palloc(ondisk.builder.committed.xcnt * sizeof(Datum)); + narrayelems = 0; + + for (narrayelems = 0; narrayelems < ondisk.builder.committed.xcnt; narrayelems++) + arrayelems[narrayelems] = Int64GetDatum((int64) ondisk.builder.committed.xip[narrayelems]); nit - Why the double assignment of narrayelems = 0? It is simpler to assign at the declaration and then remove both others. ~~~ 8. + if (ondisk.builder.catchange.xcnt > 0) + { + Datum*arrayelems; + int narrayelems; + + arrayelems = (Datum *) palloc(ondisk.builder.catchange.xcnt * sizeof(Datum)); + narrayelems = 0; + + for (narrayelems = 0; narrayelems < ondisk.builder.catchange.xcnt; narrayelems++) + arrayelems[narrayelems] = Int64GetDatum((int64) ondisk.builder.catchange.xip[narrayelems]); nit - ditto previous comment == doc/src/sgml/pglogicalinspect.sgml 9. + + The pg_logicalinspect module provides SQL functions + that allow you to inspect the contents of logical decoding components. It + allows to inspect serialized logical snapshots of a running + PostgreSQL database cluster, which is useful + for debugging or educational purposes. + nit - /It allows to inspect/It allows the inspection of/ ~~~ 10. + example: nit - /example:/For example:/ (this is in a couple of places) == src/include/replication/snapbuild_internal.h 11. +#ifndef INTERNAL_SNAPBUILD_H +#define INTERNAL_SNAPBUILD_H Shouldn't these be SNAPBUILD_INTERNAL_H to match the filename? ~~~ 12. The contents of the snapbuild.c that got moved into snapbuild_internal.h also got shuffled around a bit. e.g. originally the typedef struct SnapBuildOnDisk: +/* + * We store current state of struct SnapBuild on disk in the following manner: + * + * struct SnapBuildOnDisk; + * TransactionId * committed.xcnt; (*not xcnt_space*) + * TransactionId * catchange.xcnt; + * + */ +typedef struct SnapBuildOnDisk was directly beneath the comment: -/* --- - * Snapshot serialization support - * --- - */ - The macros were also defined immediately after the SnapBuildOnDisk fields they referred to. Wasn't that original ordering better than how it is now ordered in snapshot_internal.h? == Please also see the attachment, which implements some of those nits mentioned above. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/contrib/pg_logicalinspect/pg_logicalinspect.c b/contrib/pg_logicalinspect/pg_logicalinspect.c index dc9041a..2111202 100644 --- a/contrib/pg_logicalinspect/pg_logicalinspect.c +++ b/contrib/pg_logicalinspect/pg_logicalinspect.c @@ -1,13 +1,17 @@ /*- * * pg_logicalinspect.c - * Functions to inspect contents of PostgreSQL logical snapshots + * Functions to inspect contents of PostgreSQL logical snapshots * * Copyright (c) 2024, PostgreSQL Global Development Group * * IDENTIFICATION - * contrib/pg_logicalinspect/pg_logicalinspect.c + * contrib/pg_logicalinspect/pg_logicalinspect.c * + * + * NOTES + * For any co
Re: Pgoutput not capturing the generated columns
Hi, here are my review comments for patch v31-0002. == 1. General. IMO patches 0001 and 0002 should be merged when next posted. IIUC the reason for the split was only because there were 2 different authors but that seems to be not relevant anymore. == Commit message 2. When 'copy_data' is true, during the initial sync, the data is replicated from the publisher to the subscriber using the COPY command. The normal COPY command does not copy generated columns, so when 'publish_generated_columns' is true, we need to copy using the syntax: 'COPY (SELECT column_name FROM table_name) TO STDOUT'. ~ 2a. Should clarify that 'copy_data' is a SUBSCRIPTION parameter. 2b. Should clarify that 'publish_generated_columns' is a PUBLICATION parameter. == src/backend/replication/logical/tablesync.c make_copy_attnamelist: 3. - for (i = 0; i < rel->remoterel.natts; i++) + desc = RelationGetDescr(rel->localrel); + localgenlist = palloc0(rel->remoterel.natts * sizeof(bool)); Each time I review this code I am tricked into thinking it is wrong to use rel->remoterel.natts here for the localgenlist. AFAICT the code is actually fine because you do not store *all* the subscriber gencols in 'localgenlist' -- you only store those with matching names on the publisher table. It might be good if you could add an explanatory comment about that to prevent any future doubts. ~~~ 4. + if (!remotegenlist[remote_attnum]) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical replication target relation \"%s.%s\" has a generated column \"%s\" " + "but corresponding column on source relation is not a generated column", + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname; This error message has lots of good information. OTOH, I think when copy_data=false the error would report the subscriber column just as "missing", which is maybe less helpful. Perhaps that other copy_data=false "missing" case can be improved to share the same error message that you have here. ~~~ fetch_remote_table_info: 5. IIUC, this logic needs to be more sophisticated to handle the case that was being discussed earlier with Sawada-san [1]. e.g. when the same table has gencols but there are multiple subscribed publications where the 'publish_generated_columns' parameter differs. Also, you'll need test cases for this scenario, because it is too difficult to judge correctness just by visual inspection of the code. 6. nit - Change 'hasgencolpub' to 'has_pub_with_pubgencols' for readability, and initialize it to 'false' to make it easy to use later. ~~~ 7. - * Get column lists for each relation. + * Get column lists for each relation and check if any of the publication + * has generated column option. and + /* Check if any of the publication has generated column option */ + if (server_version >= 18) nit - tweak the comments to name the publication parameter properly. ~~~ 8. foreach(lc, MySubscription->publications) { if (foreach_current_index(lc) > 0) appendStringInfoString(&pub_names, ", "); appendStringInfoString(&pub_names, quote_literal_cstr(strVal(lfirst(lc; } I know this is existing code, but shouldn't all this be done by using the purpose-built function 'get_publications_str' ~~~ 9. + ereport(ERROR, + errcode(ERRCODE_CONNECTION_FAILURE), + errmsg("could not fetch gencolumns information from publication list: %s", +pub_names.data)); and + errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("failed to fetch tuple for gencols from publication list: %s", +pub_names.data)); nit - /gencolumns information/generated column publication information/ to make the errmsg more human-readable ~~~ 10. + bool gencols_allowed = server_version >= 18 && hasgencolpub; + + if (!gencols_allowed) + appendStringInfo(&cmd, " AND a.attgenerated = ''"); Can the 'gencols_allowed' var be removed, and the condition just be replaced with if (!has_pub_with_pubgencols)? It seems equivalent unless I am mistaken. == Please refer to the attachment which implements some of the nits mentioned above. == [1] https://www.postgresql.org/message-id/CAD21AoBun9crSWaxteMqyu8A_zme2ppa2uJvLJSJC2E3DJxQVA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 723c44c..6d17984 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -850,7 +850,7 @@ fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist_res, Oid qualRow[] = {TEXTOID}; boolisnull; bool
Re: Pgoutput not capturing the generated columns
Review comments for v31-0001. (I tried to give only new comments, but there might be some overlap with comments I previously made for v30-0001) == src/backend/catalog/pg_publication.c 1. + + if (publish_generated_columns_given) + { + values[Anum_pg_publication_pubgencolumns - 1] = BoolGetDatum(publish_generated_columns); + replaces[Anum_pg_publication_pubgencolumns - 1] = true; + } nit - unnecessary whitespace above here. == src/backend/replication/pgoutput/pgoutput.c 2. prepare_all_columns_bms + /* Iterate the cols until generated columns are found. */ + cols = bms_add_member(cols, i + 1); How does the comment relate to the statement that follows it? ~~~ 3. + * Skip generated column if pubgencolumns option was not + * specified. nit - /pubgencolumns option/publish_generated_columns parameter/ == src/bin/pg_dump/pg_dump.c 4. getPublications: nit - /i_pub_gencolumns/i_pubgencols/ (it's the same information but simpler) == src/bin/pg_dump/pg_dump.h 5. + bool pubgencolumns; } PublicationInfo; nit - /pubgencolumns/pubgencols/ (it's the same information but simpler) == vsrc/bin/psql/describe.c 6. bool has_pubviaroot; + bool has_pubgencol; nit - /has_pubgencol/has_pubgencols/ (plural consistency) == src/include/catalog/pg_publication.h 7. + /* true if generated columns data should be published */ + bool pubgencolumns; } FormData_pg_publication; nit - /pubgencolumns/pubgencols/ (it's the same information but simpler) ~~~ 8. + bool pubgencolumns; PublicationActions pubactions; } Publication; nit - /pubgencolumns/pubgencols/ (it's the same information but simpler) == src/test/regress/sql/publication.sql 9. +-- Test the publication with or without 'PUBLISH_GENERATED_COLUMNS' parameter +SET client_min_messages = 'ERROR'; +CREATE PUBLICATION pub1 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=1); +\dRp+ pub1 + +CREATE PUBLICATION pub2 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=0); +\dRp+ pub2 9a. nit - Use lowercase for the parameters. ~ 9b. nit - Fix the comment to say what the test is actually doing: "Test the publication 'publish_generated_columns' parameter enabled or disabled" == src/test/subscription/t/031_column_list.pl 10. Later I think you should add another test here to cover the scenario that I was discussing with Sawada-San -- e.g. when there are 2 publications for the same table subscribed by just 1 subscription but having different values of the 'publish_generated_columns' for the publications. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 2e7804e..cca54bc 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -515,8 +515,8 @@ CREATE TABLE people ( Generated columns may be skipped during logical replication according to the - CREATE PUBLICATION option - + CREATE PUBLICATION parameter + publish_generated_columns. diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml index e133dc3..cd20bd4 100644 --- a/doc/src/sgml/ref/create_publication.sgml +++ b/doc/src/sgml/ref/create_publication.sgml @@ -223,7 +223,7 @@ CREATE PUBLICATION name - + publish_generated_columns (boolean) @@ -231,14 +231,6 @@ CREATE PUBLICATION name associated with the publication should be replicated. The default is false. - - This option is only available for replicating generated column data from the publisher - to a regular, non-generated column in the subscriber. - - - This parameter can only be set true if copy_data is - set to false. - diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c index 272b6a1..7ebb851 100644 --- a/src/backend/catalog/pg_publication.c +++ b/src/backend/catalog/pg_publication.c @@ -999,7 +999,7 @@ GetPublication(Oid pubid) pub->pubactions.pubdelete = pubform->pubdelete; pub->pubactions.pubtruncate = pubform->pubtruncate; pub->pubviaroot = pubform->pubviaroot; - pub->pubgencolumns = pubform->pubgencolumns; + pub->pubgencols = pubform->pubgencols; ReleaseSysCache(tup); @@ -1211,7 +1211,7 @@ pg_get_publication_tables(PG_FUNCTION_ARGS) if (att->attisdropped) continue; - if (att->attgenerated && !pub->pubgencolumns) + if (att->attgenerated && !pub->pubgencols) continue; attnums[nattnums++] = att->attnum; diff --git a/src/backend/commands/public
Re: Pgoutput not capturing the generated columns
Here are some review comments for v31-0001 (for the docs only) There may be some overlap here with some comments already made for v30-0001 which are not yet addressed in v31-0001. == Commit message 1. When introducing the 'publish_generated_columns' parameter, you must also say this is a PUBLICATION parameter. ~~~ 2. With this enhancement, users can now include the 'include_generated_columns' option when querying logical replication slots using either the pgoutput plugin or the test_decoding plugin. This option, when set to 'true' or '1', instructs the replication system to include generated column information and data in the replication stream. ~ The above is stale information because it still refers to the old name 'include_generated_columns', and to test_decoding which was already removed in this patch. == doc/src/sgml/ddl.sgml 3. + Generated columns may be skipped during logical replication according to the + CREATE PUBLICATION option + + publish_generated_columns. 3a. nit - The linkend is based on the old name instead of the new name. 3b. nit - Better to call this a parameter instead of an option because that is what the CREATE PUBLICATION docs call it. == doc/src/sgml/protocol.sgml 4. + + publish_generated_columns + + +Boolean option to enable generated columns. This option controls +whether generated columns should be included in the string +representation of tuples during logical decoding in PostgreSQL. + + + + Is this even needed anymore? Now that the implementation is using a PUBLICATION parameter, isn't everything determined just by that parameter? I don't see the reason why a protocol change is needed anymore. And, if there is no protocol change needed, then this documentation change is also not needed. 5. - Next, the following message part appears for each column included in - the publication (except generated columns): + Next, the following message parts appear for each column included in + the publication (generated columns are excluded unless the parameter + + publish_generated_columns specifies otherwise): Like the previous comment above, I think everything is now determined by the PUBLICATION parameter. So maybe this should just be referring to that instead. == doc/src/sgml/ref/create_publication.sgml 6. + +publish_generated_columns (boolean) + nit - the ID is based on the old parameter name. ~ 7. + + This option is only available for replicating generated column data from the publisher + to a regular, non-generated column in the subscriber. + IMO remove this paragraph. I really don't think you should be mentioning the subscriber here at all. AFAIK this parameter is only for determining if the generated column will be published or not. What happens at the other end (e.g. logic whether it gets ignored or not by the subscriber) is more like a matrix of behaviours that could be documented in the "Logical Replication" section. But not here. (I removed this in my nitpicks attachment) ~~~ 8. + + This parameter can only be set true if copy_data is + set to false. + IMO remove this paragraph too. The user can create a PUBLICATION before a SUBSCRIPTION even exists so to say it "can only be set..." is not correct. Sure, your patch 0001 does not support the COPY of generated columns but if you want to document that then it should be documented in the CREATE SUBSCRIBER docs. But not here. (I removed this in my nitpicks attachment) TBH, it would be better if patches 0001 and 0002 were merged then you can avoid all this. IIUC they were only separate in the first place because 2 different people wrote them. It is not making reviews easier with them split. == Please see the attachment which implements some of the nits above. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 2e7804e..cca54bc 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -515,8 +515,8 @@ CREATE TABLE people ( Generated columns may be skipped during logical replication according to the - CREATE PUBLICATION option - + CREATE PUBLICATION parameter + publish_generated_columns. diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml index e133dc3..cd20bd4 100644 --- a/doc/src/sgml/ref/create_publication.sgml +++ b/doc/src/sgml/ref/create_publication.sgml @@ -223,7 +223,7 @@ CREATE PUBLICATION name - + publish_generated_columns (boolean) @@ -231,14 +231,6 @@ CREATE PUBLICATION name associated with the publication should
Re: Pgoutput not capturing the generated columns
On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada wrote: > > On Mon, Sep 16, 2024 at 8:09 PM Peter Smith wrote: > > > > I thought that the option "publish_generated_columns" is more related > > to "column lists" than "row filters". > > > > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'. > > > > > And > > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false); > > is equivalent to > > PUBLICATION pub2 FOR TABLE t1(a,b,c); > > This makes sense to me as it preserves the current behavior. > > > Then: > > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true); > > is equivalent to > > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2); > > This also makes sense. It would also include future generated columns. > > > So, I would expect this to fail because the SUBSCRIPTION docs say > > "Subscriptions having several publications in which the same table has > > been published with different column lists are not supported." > > So I agree that it would raise an error if users subscribe to both > pub1 and pub2. > > And looking back at your examples, > > > > > e.g.1 > > > > - > > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = > > > > true); > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = > > > > false); > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2; > > > > - > > > > > > > > e.g.2 > > > > - > > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns > > > > = true); > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = > > > > false); > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2; > > > > - > > Both examples would not be supported. > > > > > > > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having > > > > several publications in which the same table has been published with > > > > different column lists are not supported." > > > > > > > > Perhaps the user is supposed to deduce that the example above would > > > > work OK if table 't1' has no generated cols. OTOH, if it did have > > > > generated cols then the PUBLICATION column lists must be different and > > > > therefore it is "not supported" (??). > > > > > > With the patch, how should this feature work when users specify a > > > generated column to the column list and set publish_generated_column = > > > false, in the first place? raise an error (as we do today)? or always > > > send NULL? > > > > For this scenario, I suggested (see [1] #3) that the code could give a > > WARNING. As I wrote up-thread: This combination doesn't seem > > like something a user would do intentionally, so just silently > > ignoring it (which the current patch does) is likely going to give > > someone unexpected results/grief. > > It gives a WARNING, and then publishes the specified generated column > data (even if publish_generated_column = false)? If so, it would mean > that specifying the generated column to the column list means to > publish its data regardless of the publish_generated_column parameter > value. > No. I meant only it can give the WARNING to tell the user user "Hey, there is a conflict here because you said publish_generated_column= false, but you also specified gencols in the column list". But always it is the option "publish_generated_column" determines the final publishing behaviour. So if it says publish_generated_column=false then it would NOT publish generated columns even if they are gencols in the column list. I think this makes sense because when there is no column list specified then that implicitly means "all columns" and the table might have some gencols, but still 'publish_generated_columns' is what determines the behaviour. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
On Tue, Sep 17, 2024 at 7:02 AM Masahiko Sawada wrote: > > On Wed, Sep 11, 2024 at 10:30 PM Peter Smith wrote: > > > > Because this feature is now being implemented as a PUBLICATION option, > > there is another scenario that might need consideration; I am thinking > > about where the same table is published by multiple PUBLICATIONS (with > > different option settings) that are subscribed by a single > > SUBSCRIPTION. > > > > e.g.1 > > - > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = > > true); > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = > > false); > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2; > > - > > > > e.g.2 > > - > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = > > true); > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = > > false); > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2; > > - > > > > Do you know if this case is supported? If yes, then which publication > > option value wins? > > I would expect these option values are processed with OR. That is, we > publish changes of the generated columns if at least one publication > sets publish_generated_columns to true. It seems to me that we treat > multiple row filters in the same way. > I thought that the option "publish_generated_columns" is more related to "column lists" than "row filters". Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'. Then: PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true); is equivalent to PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2); And PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false); is equivalent to PUBLICATION pub2 FOR TABLE t1(a,b,c); So, I would expect this to fail because the SUBSCRIPTION docs say "Subscriptions having several publications in which the same table has been published with different column lists are not supported." ~~ Here's another example: PUBLICATION pub3 FOR TABLE t1(a,b); PUBLICATION pub4 FOR TABLE t1(c); Won't it be strange (e.g. difficult to explain) why pub1 and pub2 table column lists are allowed to be combined in one subscription, but pub3 and pub4 in one subscription are not supported due to the different column lists? > > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having > > several publications in which the same table has been published with > > different column lists are not supported." > > > > Perhaps the user is supposed to deduce that the example above would > > work OK if table 't1' has no generated cols. OTOH, if it did have > > generated cols then the PUBLICATION column lists must be different and > > therefore it is "not supported" (??). > > With the patch, how should this feature work when users specify a > generated column to the column list and set publish_generated_column = > false, in the first place? raise an error (as we do today)? or always > send NULL? For this scenario, I suggested (see [1] #3) that the code could give a WARNING. As I wrote up-thread: This combination doesn't seem like something a user would do intentionally, so just silently ignoring it (which the current patch does) is likely going to give someone unexpected results/grief. == [1] https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com Kind Regards, Peter Smith Fujitsu Australia
Re: Introduce XID age and inactive timeout based replication slot invalidation
Here are a few comments for the patch v46-0001. == src/backend/replication/slot.c 1. ReportSlotInvalidation On Mon, Sep 16, 2024 at 8:01 PM Bharath Rupireddy wrote: > > On Mon, Sep 9, 2024 at 1:11 PM Peter Smith wrote: > > 3. ReportSlotInvalidation > > > > I didn't understand why there was a hint for: > > "You might need to increase \"%s\".", "max_slot_wal_keep_size" > > > > Why aren't these similar cases consistent? > > It looks misleading and not very useful. What happens if the removed > WAL (that's needed for the slot) is put back into pg_wal somehow (by > manually copying from archive or by some tool/script)? Can the slot > invalidated due to wal_removed start sending WAL to its clients? > > > But you don't have an equivalent hint for timeout invalidation: > > "You might need to increase \"%s\".", "replication_slot_inactive_timeout" > > I removed this per review comments upthread. IIUC the errors are quite similar, so my previous review comment was mostly about the unexpected inconsistency of why one of them has a hint and the other one does not. I don't have a strong opinion about whether they should both *have* or *not have* hints, so long as they are treated the same. If you think the current code hint is not useful then maybe we need a new thread to address that existing issue. For example, maybe it should be removed or reworded. ~~~ 2. InvalidatePossiblyObsoleteSlot: + case RS_INVAL_INACTIVE_TIMEOUT: + + if (!SlotInactiveTimeoutCheckAllowed(s)) + break; + + /* + * Check if the slot needs to be invalidated due to + * replication_slot_inactive_timeout GUC. + */ + if (TimestampDifferenceExceeds(s->inactive_since, now, +replication_slot_inactive_timeout * 1000)) nit - it might be tidier to avoid multiple breaks by just combining these conditions. See the nitpick attachment. ~~~ 3. * - RS_INVAL_WAL_LEVEL: is logical + * - RS_INVAL_INACTIVE_TIMEOUT: inactive timeout occurs nit - use comment wording "inactive slot timeout has occurred", to make it identical to the comment in slot.h == src/test/recovery/t/050_invalidate_slots.pl 4. +# Despite inactive timeout being set, the synced slot won't get invalidated on +# its own on the standby. So, we must not see invalidation message in server +# log. +$standby1->safe_psql('postgres', "CHECKPOINT"); +ok( !$standby1->log_contains( + "invalidating obsolete replication slot \"sync_slot1\"", + $logstart), + 'check that synced slot sync_slot1 has not been invalidated on standby' +); + It seems kind of brittle to check the logs for something that is NOT there because any change to the message will make this accidentally pass. Apart from that, it might anyway be more efficient just to check the pg_replication_slots again to make sure the 'invalidation_reason remains' still NULL. == Please see the attachment which implements some of the nit changes mentioned above. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 851120e..0076e4b 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1716,15 +1716,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, invalidation_cause = cause; break; case RS_INVAL_INACTIVE_TIMEOUT: - - if (!SlotInactiveTimeoutCheckAllowed(s)) - break; - /* * Check if the slot needs to be invalidated due to * replication_slot_inactive_timeout GUC. */ - if (TimestampDifferenceExceeds(s->inactive_since, now, + if (SlotInactiveTimeoutCheckAllowed(s) && + TimestampDifferenceExceeds(s->inactive_since, now, replication_slot_inactive_timeout * 1000)) { invalidation_cause = cause; @@ -1894,7 +1891,7 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, * - RS_INVAL_HORIZON: requires a snapshot <= the given horizon in the given * db; dboid may be InvalidOid for shared relations * - RS_INVAL_WAL_LEVEL: is logical - * - RS_INVAL_INACTIVE_TIMEOUT: inactive timeout occurs + * - RS_INVAL_INACTIVE_TIMEOUT: inactive slot time
Re: Pgoutput not capturing the generated columns
gt; > > > > > > > I think we can create a publication for a single table, so what we can > > > > > do with this feature can be done also by the idea you described below. > > > > > > > > > > > Yet another idea is to keep this as a publication option > > > > > > (include_generated_columns or publish_generated_columns) similar to > > > > > > "publish_via_partition_root". Normally, "publish_via_partition_root" > > > > > > is used when tables on either side have different partitions > > > > > > hierarchies which is somewhat the case here. > > > > > > > > > > It sounds more useful to me. > > > > > > > > > > > > > Fair enough. Let's see if anyone else has any preference among the > > > > proposed methods or can think of a better way. > > > > > > I have fixed the current issue. I have added the option > > > 'publish_generated_columns' to the publisher side and created the new > > > test cases accordingly. > > > The attached patches contain the desired changes. > > > > > > > Thank you for updating the patches. I have some comments: > > > > Do we really need to add this option to test_decoding? I think it > > would be good if this improves the test coverage. Otherwise, I'm not > > sure we need this part. If we want to add it, I think it would be > > better to have it in a separate patch. > > > > I have removed the option from the test_decoding file. > > > --- > > + > > + If the publisher-side column is also a generated column > > then this option > > + has no effect; the publisher column will be filled as normal > > with the > > + publisher-side computed or default data. > > + > > > > I don't understand this description. Why does this option have no > > effect if the publisher-side column is a generated column? > > > > The documentation was incorrect. Currently, replicating from a > publisher table with a generated column to a subscriber table with a > generated column will result in an error. This has now been updated. > > > --- > > + > > + This parameter can only be set true if > > copy_data is > > + set to false. > > + > > > > If I understand this patch correctly, it doesn't disallow to set > > copy_data to true when the publish_generated_columns option is > > specified. But do we want to disallow it? I think it would be more > > useful and understandable if we allow to use both > > publish_generated_columns (publisher option) and copy_data (subscriber > > option) at the same time. > > > > Support for tablesync with generated columns was not included in the > initial patch, and this was reflected in the documentation. The > functionality for syncing generated column data has been introduced > with the 0002 patch. > Since nothing was said otherwise, I assumed my v30-0001 comments were addressed in v31, but the new code seems to have quite a few of my suggested changes missing. If you haven't addressed my review comments for patch 0001 yet, please say so. OTOH, please give reasons for any rejected comments. > The attached v31 patches contain the changes for the same. I won't be > posting the test patch for now. I will share it once this patch has > been stabilized. How can the patch become "stabilized" without associated tests to verify the behaviour is not broken? e.g. I can write a stable function that says 2+2=5. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Because this feature is now being implemented as a PUBLICATION option, there is another scenario that might need consideration; I am thinking about where the same table is published by multiple PUBLICATIONS (with different option settings) that are subscribed by a single SUBSCRIPTION. e.g.1 - CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true); CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false); CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2; - e.g.2 - CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true); CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false); CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2; - Do you know if this case is supported? If yes, then which publication option value wins? The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having several publications in which the same table has been published with different column lists are not supported." Perhaps the user is supposed to deduce that the example above would work OK if table 't1' has no generated cols. OTOH, if it did have generated cols then the PUBLICATION column lists must be different and therefore it is "not supported" (??). I have not tried this to see what happens, but even if it behaves as expected, there should probably be some comments/docs/tests for this scenario to clarify it for the user. Notice that "publish_via_partition_root" has a similar conundrum, but in that case, the behaviour is documented in the CREATE PUBLICATION docs [2]. So, maybe "publish_generated_columns" should be documented a bit like that. == [1] https://www.postgresql.org/docs/devel/sql-createsubscription.html [2] https://www.postgresql.org/docs/devel/sql-createpublication.html Kind Regards, Peter Smith. Fujitsu Australia
Remove shadowed declaration warnings
o’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:841:46: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:961:47: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1104:41: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1142:41: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1182:45: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1242:54: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1272:56: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1314:70: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1363:60: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1553:57: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1627:55: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1681:64: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1739:69: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] pg_createsubscriber.c:1830:64: warning: declaration of ‘dbinfo’ shadows a global declaration [-Wshadow] pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow] == Kind Regards, Peter Smith. Fujitsu Australia controldata_utils.c:52:29: warning: declaration of ‘DataDir’ shadows a global declaration [-Wshadow] ../../src/include/miscadmin.h:172:26: warning: shadowed declaration is here [-Wshadow] controldata_utils.c:189:32: warning: declaration of ‘DataDir’ shadows a global declaration [-Wshadow] ../../src/include/miscadmin.h:172:26: warning: shadowed declaration is here [-Wshadow] brin.c:685:16: warning: declaration of ‘tmp’ shadows a previous local [-Wshadow] brin.c:579:11: warning: shadowed declaration is here [-Wshadow] gistbuild.c:1159:23: warning: declaration of ‘splitinfo’ shadows a previous local [-Wshadow] gistbuild.c:1059:11: warning: shadowed declaration is here [-Wshadow] xlogdesc.c:40:26: warning: declaration of ‘wal_level’ shadows a global declaration [-Wshadow] ../../../../src/include/access/xlog.h:96:24: warning: shadowed declaration is here [-Wshadow] xlogdesc.c:165:9: warning: declaration of ‘wal_level’ shadows a global declaration [-Wshadow] ../../../../src/include/access/xlog.h:96:24: warning: shadowed declaration is here [-Wshadow] xlogreader.c:106:24: warning: declaration of ‘wal_segment_size’ shadows a global declaration [-Wshadow] ../../../../src/include/access/xlog.h:37:24: warning: shadowed declaration is here [-Wshadow] xlogrecovery.c:1210:13: warning: declaration of ‘backupEndRequired’ shadows a global declaration [-Wshadow] xlogrecovery.c:284:13: warning: shadowed declaration is here [-Wshadow] xlogrecovery.c:1920:33: warning: declaration of ‘xlogreader’ shadows a global declaration [-Wshadow] xlogrecovery.c:189:25: warning: shadowed declaration is here [-Wshadow] xlogrecovery.c:3144:28: warning: declaration of ‘xlogprefetcher’ shadows a global declaration [-Wshadow] xlogrecovery.c:192:24: warning: shadowed declaration is here [-Wshadow] xlogrecovery.c:3148:19: warning: declaration of ‘xlogreader’ shadows a global declaration [-Wshadow] xlogrecovery.c:189:25: warning: shadowed declaration is here [-Wshadow] xlogrecovery.c:3311:31: warning: declaration of ‘xlogreader’ shadows a global declaration [-Wshadow] xlogrecovery.c:189:25: warning: shadowed declaration is here [-Wshadow] xlogrecovery.c:4062:38: warning: declaration of ‘xlogprefetcher’ shadows a global declaration [
Re: Pgoutput not capturing the generated columns
Hi Shubham, Here are my general comments about the v30-0002 TAP test patch. == 1. As mentioned in a previous post [1] there are still several references to the old 'include_generated_columns' option remaining in this patch. They need replacing. ~~~ 2. +# Furthermore, all combinations are tested for publish_generated_columns=false +# (see subscription sub1 of database 'postgres'), and +# publish_generated_columns=true (see subscription sub2 of database +# 'test_igc_true'). Those 'see subscription' notes and 'test_igc_true' are from the old implementation. Those need fixing. BTW, 'test_pgc_true' is a better name for the database now that the option name is changed. In the previous implementation, the TAP test environment was: - a common publication pub, on the 'postgres' database - a subscription sub1 with option include_generated_columns=false, on the 'postgres' database - a subscription sub2 with option include_generated_columns=true, on the 'test_igc_true' database Now it is like: - a publication pub1, on the 'postgres' database, with option publish_generated_columns=false - a publication pub2, on the 'postgres' database, with option publish_generated_columns=true - a subscription sub1, on the 'postgres' database for publication pub1 - a subscription sub2, on the 'test_pgc_true' database for publication pub2 It would be good to document that above convention because knowing how the naming/numbering works makes it a lot easier to read the subsequent test cases. Of course, it is really important to name/number everything consistently otherwise these tests become hard to follow. AFAICT it is mostly OK, but the generated -> generated publication should be called 'regress_pub2_gen_to_gen' ~~~ 3. +# Create table. +$node_publisher->safe_psql( + 'postgres', qq( + CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a * 2) STORED); + INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3); +)); + +# Create publication with publish_generated_columns=false. +$node_publisher->safe_psql('postgres', + "CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE tab_gen_to_nogen WITH (publish_generated_columns = false)" +); + +# Create table and subscription with copy_data=true. +$node_subscriber->safe_psql( + 'postgres', qq( + CREATE TABLE tab_gen_to_nogen (a int, b int); + CREATE SUBSCRIPTION regress_sub1_gen_to_nogen CONNECTION '$publisher_connstr' PUBLICATION regress_pub1_gen_to_nogen WITH (copy_data = true); +)); + +# Create publication with publish_generated_columns=true. +$node_publisher->safe_psql('postgres', + "CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE tab_gen_to_nogen WITH (publish_generated_columns = true)" +); + The code can be restructured to be simpler. Both publications are always created on the 'postgres' database at the publisher node, so let's just create them at the same time as the creating the publisher table. It also makes readability much better e.g. # Create table, and publications $node_publisher->safe_psql( 'postgres', qq( CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a * 2) STORED); INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3); CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE tab_gen_to_nogen WITH (publish_generated_columns = false); CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE tab_gen_to_nogen WITH (publish_generated_columns = true); )); IFAICT this same simplification can be repeated multiple times in this TAP file. ~~ Similarly, it would be neater to combine DROP PUBLICATION's together too. ~~~ 4. Hopefully, the generated column 'copy_data' can be implemented again soon for subscriptions, and then the initial sync tests here can be properly implemented instead of the placeholders currently in patch 0002. == [1] https://www.postgresql.org/message-id/CAHut%2BPuDJToG%3DV-ogTi9_6fnhhn2S0%2BsVRGPynhcf9mEh0Q%3DLA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Here are a some more review comments for patch v30-0001. == src/sgml/ref/create_publication.sgml 1. + + If the publisher-side column is also a generated column then this option + has no effect; the publisher column will be filled as normal with the + publisher-side computed or default data. + It should say "subscriber-side"; not "publisher-side". The same was already reported by Sawada-San [1]. ~~~ 2. + + This parameter can only be set true if copy_data is + set to false. + IMO this limitation should be addressed by patch 0001 like it was already done in the previous patches (e.g. v22-0002). I think Sawada-san suggested the same [1]. Anyway, 'copy_data' is not a PUBLICATION option, so the fact it is mentioned like this without any reference to the SUBSCRIPTION seems like a cut/paste error from the previous implementation. == src/backend/catalog/pg_publication.c 3. pub_collist_validate - if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated) - ereport(ERROR, - errcode(ERRCODE_INVALID_COLUMN_REFERENCE), - errmsg("cannot use generated column \"%s\" in publication column list", -colname)); - Instead of just removing this ERROR entirely here, I thought it would be more user-friendly to give a WARNING if the PUBLICATION's explicit column list includes generated cols when the option "publish_generated_columns" is false. This combination doesn't seem like something a user would do intentionally, so just silently ignoring it (like the current patch does) is likely going to give someone unexpected results/grief. == src/backend/replication/logical/proto.c 4. logicalrep_write_tuple, and logicalrep_write_attrs: - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; Why aren't you also checking the new PUBLICATION option here and skipping all gencols if the "publish_generated_columns" option is false? Or is the BMS of pgoutput_column_list_init handling this case? Maybe there should be an Assert for this? == src/backend/replication/pgoutput/pgoutput.c 5. send_relation_and_attrs - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; Same question as #4. ~~~ 6. prepare_all_columns_bms and pgoutput_column_list_init + if (att->attgenerated && !pub->pubgencolumns) + cols = bms_del_member(cols, i + 1); IIUC, the algorithm seems overly tricky filling the BMS with all columns, before straight away conditionally removing the generated columns. Can't it be refactored to assign all the correct columns up-front, to avoid calling bms_del_member()? == src/bin/pg_dump/pg_dump.c 7. getPublications IIUC, there is lots of missing SQL code here (for all older versions) that should be saying "false AS pubgencolumns". e.g. compare the SQL with how "false AS pubviaroot" is used. == src/bin/pg_dump/t/002_pg_dump.pl 8. Missing tests? I expected to see a pg_dump test for this new PUBLICATION option. == src/test/regress/sql/publication.sql 9. Missing tests? How about adding another test case that checks this new option must be "Boolean"? ~~~ 10. Missing tests? --- error: generated column "d" can't be in list +-- ok: generated columns can be in the list too ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d); +ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5; (see my earlier comment #3) IMO there should be another test case for a WARNING here if the user attempts to include generated column 'd' in an explicit PUBLICATION column list while the "publish_generated-columns" is false. == [1] https://www.postgresql.org/message-id/CAD21AoA-tdTz0G-vri8KM2TXeFU8RCDsOpBXUBCgwkfokF7%3DjA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Disallow altering invalidated replication slots
On Wed, Sep 11, 2024 at 3:54 AM Bharath Rupireddy wrote: > > Hi, > > Thanks for reviewing. > > On Tue, Sep 10, 2024 at 8:40 AM Peter Smith wrote: > > > > Commit message > > > > 1. > > ALTER_REPLICATION_SLOT on invalidated replication slots is unnecessary > > as there is no way... > > > > suggestion: > > ALTER_REPLICATION_SLOT for invalid replication slots should not be > > allowed because there is no way... > > Modified. > > > == > > 2. Missing docs update > > > > Should this docs page [1] be updated to say ALTER_REPLICATION_SLOT is > > not allowed for invalid slots? > > Haven't noticed for other ERROR cases in the docs, e.g. slots being > synced, temporary slots. Not sure if it's worth adding every ERROR > case to the docs. > OK. > > == > > src/backend/replication/slot.c > > > > 3. > > + if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE) > > + ereport(ERROR, > > + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > + errmsg("cannot alter replication slot \"%s\"", name), > > + errdetail("This replication slot was invalidated due to \"%s\".", > > + SlotInvalidationCauses[MyReplicationSlot->data.invalidated])); > > + > > > > I thought including the reason "invalid" (e.g. "cannot alter invalid > > replication slot \"%s\"") in the message might be better, but OTOH I > > see the patch message is the same as an existing one. Maybe see what > > others think. > > Changed. > > > == > > src/test/recovery/t/035_standby_logical_decoding.pl > > > > 3. > > There is already a comment about this test: > > ## > > # Recovery conflict: Invalidate conflicting slots, including in-use slots > > # Scenario 1: hot_standby_feedback off and vacuum FULL > > # > > # In passing, ensure that replication slot stats are not removed when the > > # active slot is invalidated. > > ## > > > > IMO we should update that "In passing..." sentence to something like: > > > > In passing, ensure that replication slot stats are not removed when > > the active slot is invalidated, and check that an error occurs when > > attempting to alter the invalid slot. > > Added. But, keeping it closer to the test case doesn't hurt. > > Please find the attached v2 patch also having Shveta's review comments > addressed. > The v2 patch looks OK to me. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
IIUC, previously there was a subscriber side option 'include_generated_columns', but now since v30* there is a publisher side option 'publish_generated_columns'. Fair enough, but in the v30* patches I can still see remnants of the old name 'include_generated_columns' all over the place: - in the commit message - in the code (struct field names, param names etc) - in the comments - in the docs If the decision is to call the new PUBLICATION option 'publish_generated_columns', then can't we please use that one name *everywhere* -- e.g. replace all cases where any old name is still lurking? == Kind Regards, Peter Smith. Fujitsu Australia
Re: GUC names in messages
On Wed, Sep 4, 2024 at 3:54 PM Michael Paquier wrote: > ... > 0001 and 0004 have been applied with these tweaks. I am still not > sure about the changes for DateStyle and IntervalStyle in 0002 and > 0003. Perhaps others have an opinion that could drive to a consensus. > Thanks for pushing the patches 0001 and 0004. I have rebased the two remaining patches. See v12 attached. == Kind Regards, Peter Smith. Fujitsu Australia v12-0002-GUC-names-fix-case-datestyle.patch Description: Binary data v12-0001-GUC-names-fix-case-intervalstyle.patch Description: Binary data
Re: Disallow altering invalidated replication slots
Hi, here are some review comments for patch v1. == Commit message 1. ALTER_REPLICATION_SLOT on invalidated replication slots is unnecessary as there is no way... suggestion: ALTER_REPLICATION_SLOT for invalid replication slots should not be allowed because there is no way... == 2. Missing docs update Should this docs page [1] be updated to say ALTER_REPLICATION_SLOT is not allowed for invalid slots? == src/backend/replication/slot.c 3. + if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE) + ereport(ERROR, + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot alter replication slot \"%s\"", name), + errdetail("This replication slot was invalidated due to \"%s\".", + SlotInvalidationCauses[MyReplicationSlot->data.invalidated])); + I thought including the reason "invalid" (e.g. "cannot alter invalid replication slot \"%s\"") in the message might be better, but OTOH I see the patch message is the same as an existing one. Maybe see what others think. == src/test/recovery/t/035_standby_logical_decoding.pl 3. There is already a comment about this test: ## # Recovery conflict: Invalidate conflicting slots, including in-use slots # Scenario 1: hot_standby_feedback off and vacuum FULL # # In passing, ensure that replication slot stats are not removed when the # active slot is invalidated. ## IMO we should update that "In passing..." sentence to something like: In passing, ensure that replication slot stats are not removed when the active slot is invalidated, and check that an error occurs when attempting to alter the invalid slot. == [1] docs - https://www.postgresql.org/docs/devel/protocol-replication.html Kind Regards, Peter Smith. Fujitsu Austalia
Re: Introduce XID age and inactive timeout based replication slot invalidation
~~~ 16. +{ + my ($node, $slot, $offset, $inactive_timeout) = @_; + my $name = $node->name; The variable $name seems too vague. How about $node_name? ~~~ 17. + # Wait for invalidation reason to be set + $node->poll_query_until( + 'postgres', qq[ + SELECT COUNT(slot_name) = 1 FROM pg_replication_slots + WHERE slot_name = '$slot' AND + invalidation_reason = 'inactive_timeout'; + ]) + or die + "Timed out while waiting for invalidation reason of slot $slot to be set on node $name"; 17a. nit - /# Wait for invalidation reason to be set/# Check that the invalidation reason is 'inactive_timeout'/ IIUC, the 'trigger_slot_invalidation' function has already invalidated the slot at this point, so we are not really "Waiting..."; we are "Checking..." that the reason was correctly set. ~ 17b. I think this code fragment maybe would be better put inside the 'trigger_slot_invalidation' function. (I've done this in the nitpicks attachment) ~~~ 18. + # Check that invalidated slot cannot be acquired + my ($result, $stdout, $stderr); + + ($result, $stdout, $stderr) = $node->psql( + 'postgres', qq[ + SELECT pg_replication_slot_advance('$slot', '0/1'); + ]); 18a. s/Check that invalidated slot/Check that an invalidated slot/ ~ 18b. nit - Remove some blank lines, because the comment applies to all below it. == sub trigger_slot_invalidation 19. +# Trigger slot invalidation and confirm it in server log +sub trigger_slot_invalidation nit - s/confirm it in server log/confirm it in the server log/ ~ 20. +{ + my ($node, $slot, $offset, $inactive_timeout) = @_; + my $name = $node->name; + my $invalidated = 0; (same as the other subroutine) nit - The variable $name seems too vague. How about $node_name? == Please refer to the attached nitpicks top-up patch which implements most of the above nits. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/test/recovery/t/050_invalidate_slots.pl b/src/test/recovery/t/050_invalidate_slots.pl index 669d6cc..34b46d5 100644 --- a/src/test/recovery/t/050_invalidate_slots.pl +++ b/src/test/recovery/t/050_invalidate_slots.pl @@ -12,10 +12,10 @@ use Time::HiRes qw(usleep); # = # Testcase start # -# Invalidate streaming standby slot and logical failover slot on primary due to -# inactive timeout. Also, check logical failover slot synced to standby from -# primary doesn't invalidate on its own, but gets the invalidated state from the -# primary. +# Invalidate a streaming standby slot and logical failover slot on the primary +# due to inactive timeout. Also, check that a logical failover slot synced to +# the standby from the primary doesn't invalidate on its own, but gets the +# invalidated state from the primary. # Initialize primary my $primary = PostgreSQL::Test::Cluster->new('primary'); @@ -45,7 +45,7 @@ primary_slot_name = 'sb_slot1' primary_conninfo = '$connstr dbname=postgres' )); -# Create sync slot on primary +# Create sync slot on the primary $primary->psql('postgres', q{SELECT pg_create_logical_replication_slot('sync_slot1', 'test_decoding', false, false, true);} ); @@ -57,13 +57,13 @@ $primary->safe_psql( $standby1->start; -# Wait until standby has replayed enough data +# Wait until the standby has replayed enough data $primary->wait_for_catchup($standby1); -# Sync primary slot to standby +# Sync the primary slots to the standby $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();"); -# Confirm that logical failover slot is created on standby +# Confirm that the logical failover slot is created on the standby is( $standby1->safe_psql( 'postgres', q{SELECT count(*) = 1 FROM pg_replication_slots @@ -73,24 +73,24 @@ is( $standby1->safe_psql( 'logical slot sync_slot1 has synced as true on standby'); my $logstart = -s $primary->logfile; -my $inactive_timeout = 1; -# Set timeout so that next checkpoint will invalidate inactive slot +# Set timeout GUC so that that next checkpoint will invalidate inactive slots +my $inactive_timeout = 1; $primary->safe_psql( 'postgres', qq[ ALTER SYSTEM SET replication_slot_inactive_timeout TO '${inactive_timeout}s'; ]); $primary->reload; -# Check for logical failover slot to become inactive on primary. Note that +# Wait for logical failover slot to become inactive on the primary. Note that # nobody has acquired slot yet, so it must get invalidated due to # inactive timeout. -check_for_slot_invalidation($primary, 'sync_slot1', $logstart, +wait_for_slot_invalidation($primary, 'sync_slot1', $logstart, $ina
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, here are some review comments for v45-0001 (excluding the test code) == doc/src/sgml/config.sgml 1. +Note that the inactive timeout invalidation mechanism is not +applicable for slots on the standby server that are being synced +from primary server (i.e., standby slots having nit - /from primary server/from the primary server/ == src/backend/replication/slot.c 2. ReplicationSlotAcquire + errmsg("can no longer get changes from replication slot \"%s\"", + NameStr(s->data.name)), + errdetail("This slot has been invalidated because it was inactive for longer than the amount of time specified by \"%s\".", +"replication_slot_inactive_timeout."))); nit - "replication_slot_inactive_timeout." - should be no period inside that GUC name literal ~~~ 3. ReportSlotInvalidation I didn't understand why there was a hint for: "You might need to increase \"%s\".", "max_slot_wal_keep_size" But you don't have an equivalent hint for timeout invalidation: "You might need to increase \"%s\".", "replication_slot_inactive_timeout" Why aren't these similar cases consistent? ~~~ 4. RestoreSlotFromDisk + /* Use the same inactive_since time for all the slots. */ + if (now == 0) + now = GetCurrentTimestamp(); + Is the deferred assignment really necessary? Why not just unconditionally assign the 'now' just before the for-loop? Or even at the declaration? e.g. The 'replication_slot_inactive_timeout' is measured in seconds so I don't think 'inactive_since' being wrong by a millisecond here will make any difference. == src/include/replication/slot.h 5. ReplicationSlotSetInactiveSince +/* + * Set slot's inactive_since property unless it was previously invalidated due + * to inactive timeout. + */ +static inline void +ReplicationSlotSetInactiveSince(ReplicationSlot *s, TimestampTz *now, + bool acquire_lock) +{ + if (acquire_lock) + SpinLockAcquire(&s->mutex); + + if (s->data.invalidated != RS_INVAL_INACTIVE_TIMEOUT) + s->inactive_since = *now; + + if (acquire_lock) + SpinLockRelease(&s->mutex); +} Is the logic correct? What if the slot was already invalid due to some reason other than RS_INVAL_INACTIVE_TIMEOUT? Is an Assert needed? == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 27b2285..97b4fb5 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4582,7 +4582,7 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows Note that the inactive timeout invalidation mechanism is not applicable for slots on the standby server that are being synced -from primary server (i.e., standby slots having +from the primary server (i.e., standby slots having pg_replication_slots.synced value true). Synced slots are always considered to be inactive because they don't diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index d92b92b..8cc67b4 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -640,7 +640,7 @@ retry: errmsg("can no longer get changes from replication slot \"%s\"", NameStr(s->data.name)), errdetail("This slot has been invalidated because it was inactive for longer than the amount of time specified by \"%s\".", - "replication_slot_inactive_timeout."))); + "replication_slot_inactive_timeout"))); } /*
Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description
On Mon, Sep 9, 2024 at 12:20 PM David G. Johnston wrote: > > > > On Sun, Sep 8, 2024, 18:55 Peter Smith wrote: >> >> Saying "The time..." is fine, but the suggestions given seem backwards to me: >> - The time this slot was inactivated >> - The time when the slot became inactive. >> - The time when the slot was deactivated. >> >> e.g. It is not like light switch. So, there is no moment when the >> active slot "became inactive" or "was deactivated". > > > While this is plausible the existing wording and the name of the field > definitely fail to convey this. > >> >> Rather, the 'inactive_since' timestamp field is simply: >> - The time the slot was last active. >> - The last time the slot was active. > > > I see your point but that wording is also quite confusing when an active slot > returns null for this field. > > At this point I'm confused enough to need whatever wording is taken to be > supported by someone explaining the code that interacts with this field. > Me too. I created this thread primarily to get the description changed to clarify this field represents a moment in time, rather than a duration. So I will be happy with any wording that addresses that. > I suppose I'm expecting something like: The time the last activity finished, > or null if an activity is in-progress. == Kind Regards, Peter Smith. Fujitsu Australia
Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description
Saying "The time..." is fine, but the suggestions given seem backwards to me: - The time this slot was inactivated - The time when the slot became inactive. - The time when the slot was deactivated. e.g. It is not like light switch. So, there is no moment when the active slot "became inactive" or "was deactivated". Rather, the 'inactive_since' timestamp field is simply: - The time the slot was last active. - The last time the slot was active. == Kind Regards, Peter Smith. Fujitsu Australia
Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description
On Tue, Sep 3, 2024 at 4:12 PM shveta malik wrote: > ... > Shall we make the change in code-comment as well: > > typedef struct ReplicationSlot > { > ... > /* The time since the slot has become inactive */ > TimestampTz inactive_since; > } > Hi Shveta, Yes, I think so. I hadn't bothered to include this in the v1 patch only because the docs are user-facing, but this code comment isn't. However, now that you've mentioned it, I made the same change here also. Thanks. See patch v2. == Kind Regards, Peter Smith. Fujitsu Australia v2-0001-fix-description-for-inactive_since.patch Description: Binary data
Re: Collect statistics about conflicts in logical replication
On Tue, Sep 3, 2024 at 9:23 PM Zhijie Hou (Fujitsu) wrote: > > On Tuesday, September 3, 2024 7:12 PM Amit Kapila > wrote: > > > > Testing the stats for all types of conflicts is not required for this patch > > especially because they increase the timings by 3-4s. We can add tests for > > one > > or two types of conflicts. > > ... > > Thanks for the comments. I have addressed the comments and adjusted the tests. > In the V6 patch, Only insert_exists and delete_missing are tested. > > I confirmed that it only increased the testing time by 1 second on my machine. > > Best Regards, > Hou zj It seems a pity to throw away perfectly good test cases just because they increase how long the suite takes to run. This seems like yet another example of where we could have made good use of the 'PG_TEST_EXTRA' environment variable. I have been trying to propose adding "subscription" support for this in another thread [1]. By using this variable to make some tests conditional, we could have the best of both worlds. e.g. - retain all tests, but - by default, only run a subset of those tests (to keep default test execution time low). I hope that if the idea to use PG_TEST_EXTRA for "subscription" tests gets accepted then later we can revisit this, and put all the removed extra test cases back in again. == [1] https://www.postgresql.org/message-id/CAHut%2BPsgtnr5BgcqYwD3PSf-AsUtVDE_j799AaZeAjJvE6HGtA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: GUC names in messages
On Tue, Sep 3, 2024 at 4:35 PM Michael Paquier wrote: > > On Tue, Sep 03, 2024 at 12:00:19PM +1000, Peter Smith wrote: > > Here is the rebased patch set v10*. Everything is the same as before > > except now there are only 7 patches instead of 8. The previous v9-0001 > > ("bool") patch no longer exists because those changes are now already > > present in HEAD. > > > > I hope these might be pushed soon to avoid further rebasing. > > 0001~0004 could just be merged, they're the same thing, for different > GUC types. The consensus mentioned in 17974ec25946 makes that clear. > > 0007 is a good thing for translators, indeed.. I'll see about doing > something here, at least. > -- > Michael Hi Michael, thanks for your interest. I have merged the patches 0001-0004 as suggested. Please see v11 attachments. == Kind Regards, Peter Smith. Fujitsu Australia v11-0001-Add-quotes-for-GUCs.patch Description: Binary data v11-0003-GUC-names-fix-case-datestyle.patch Description: Binary data v11-0004-GUC-names-make-common-translatable-messages.patch Description: Binary data v11-0002-GUC-names-fix-case-intervalstyle.patch Description: Binary data
Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description
On Tue, Sep 3, 2024 at 4:35 PM Bertrand Drouvot wrote: > > Hi, > > On Tue, Sep 03, 2024 at 10:43:14AM +0530, Amit Kapila wrote: > > On Mon, Sep 2, 2024 at 9:14 AM shveta malik wrote: > > > > > > On Mon, Sep 2, 2024 at 5:47 AM Peter Smith wrote: > > > > > > > > > > > > To summarize, the current description wrongly describes the field as a > > > > time duration: > > > > "The time since the slot has become inactive." > > > > > > > > I suggest replacing it with: > > > > "The slot has been inactive since this time." > > > > > > > > > > +1 for the change. If I had read the document without knowing about > > > the patch, I too would have interpreted it as a duration. > > > > > > > The suggested change looks good to me as well. I'll wait for a day or > > two before pushing to see if anyone thinks otherwise. > > I'm not 100% convinced the current wording is confusing because: > > - the field type is described as a "timestamptz". > - there is no duration unit in the wording (if we were to describe a duration, > we would probably add an unit to it, like "The time (in s)..."). > Hmm. I assure you it is confusing because in English "The time since" implies duration, and that makes the sentence contrary to the timestamptz field type. Indeed, I cited the Chat-GPT's interpretation above specifically so that people would not just take this as my opinion. > That said, if we want to highlight that this is not a duration, what about? > > "The time (not duration) since the slot has become inactive." > There is no need to "highlight" anything. To avoid confusion we only need to say what we mean. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Introduce XID age and inactive timeout based replication slot invalidation
idation here, because it is the other subroutine doing the waiting for the invalidation message in the logs. Instead, here I think you are just confirming the 'invalidation_reason' got set correctly. The comment should say what it is really doing. == sub check_for_slot_invalidation_in_server_log 10. +# Check for invalidation of slot in server log +sub check_for_slot_invalidation_in_server_log +{ I think the main function of this subroutine is the CHECKPOINT and the waiting for the server log to say invalidation happened. It is doing a loop of a) CHECKPOINT then b) inspecting the server log for the slot invalidation, and c) waiting for a bit. Repeat 10 times. A comment describing the logic for this subroutine would be helpful. The most important side-effect of this function is the CHECKPOINT because without that nothing will ever get invalidated due to inactivity, but this key point is not obvious from the subroutine name. IMO it would be better to name this differently to reflect what it is really doing: e.g. "CHECKPOINT_and_wait_for_slot_invalidation_in_server_log" == Kind Regards, Peter Smith. Fujitsu Australia
Re: GUC names in messages
Hi. The cfbot was reporting my patches needed to be rebased. Here is the rebased patch set v10*. Everything is the same as before except now there are only 7 patches instead of 8. The previous v9-0001 ("bool") patch no longer exists because those changes are now already present in HEAD. I hope these might be pushed soon to avoid further rebasing. == Kind Regards, Peter Smith. Fujitsu Australia v10-0001-Add-quotes-for-GUCs-int.patch Description: Binary data v10-0004-Add-quotes-for-GUCs-enum.patch Description: Binary data v10-0003-Add-quotes-for-GUCs-string.patch Description: Binary data v10-0002-Add-quotes-for-GUCs-real.patch Description: Binary data v10-0005-GUC-names-fix-case-intervalstyle.patch Description: Binary data v10-0006-GUC-names-fix-case-datestyle.patch Description: Binary data v10-0007-GUC-names-make-common-translatable-messages.patch Description: Binary data
Re: Collect statistics about conflicts in logical replication
On Mon, Sep 2, 2024 at 1:28 PM shveta malik wrote: > > On Mon, Sep 2, 2024 at 4:20 AM Peter Smith wrote: > > > > On Fri, Aug 30, 2024 at 4:24 PM shveta malik wrote: > > > > > > On Fri, Aug 30, 2024 at 10:53 AM Peter Smith > > > wrote: > > > > > > ... > > > > 2. Arrange all the counts into an intuitive/natural order > > > > > > > > There is an intuitive/natural ordering for these counts. For example, > > > > the 'confl_*' count fields are in the order insert -> update -> > > > > delete, which LGTM. > > > > > > > > Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not > > > > in a good order. > > > > > > > > IMO it makes more sense if everything is ordered as: > > > > 'sync_error_count', then 'apply_error_count', then all the 'confl_*' > > > > counts. > > > > > > > > This comment applies to lots of places, e.g.: > > > > - docs (doc/src/sgml/monitoring.sgml) > > > > - function pg_stat_get_subscription_stats (pg_proc.dat) > > > > - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql) > > > > - TAP test SELECTs (test/subscription/t/026_stats.pl) > > > > > > > > As all those places are already impacted by this patch, I think it > > > > would be good if (in passing) we (if possible) also swapped the > > > > sync/apply counts so everything is ordered intuitively top-to-bottom > > > > or left-to-right. > > > > > > Not sure about this though. It does not seem to belong to the current > > > patch. > > > > > > > Fair enough. But, besides being inappropriate to include in the > > current patch, do you think the suggestion to reorder them made sense? > > If it has some merit, then I will propose it again as a separate > > thread. > > > > Yes, I think it makes sense. With respect to internal code, it might > be still okay as is, but when it comes to pg_stat_subscription_stats, > I think it is better if user finds it in below order: > subid | subname | sync_error_count | apply_error_count | confl_* > > rather than the existing one: > subid | subname | apply_error_count | sync_error_count | confl_* > Hi Shveta, Thanks. FYI - I created a new thread for this here [1]. == [1] https://www.postgresql.org/message-id/CAHut+PvbOw90wgGF4aV1HyYtX=6pjWc+pn8_fep7L=alxwx...@mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
pg_stats_subscription_stats order of the '*_count' columns
Hi, While reviewing another thread I was looking at the view 'pg_stats_subscription_stats' view. In particular, I was looking at the order of the "*_count" columns of that view. IMO there is an intuitive/natural ordering for the logical replication operations (LR) being counted. For example, LR "initial tablesync" always comes before LR "apply". I propose that the columns of the view should also be in this same intuitive order: Specifically, "sync_error_count" should come before "apply_error_count" (left-to-right in view, top-to-bottom in docs). Currently, they are not arranged that way. The view today has only 2 count columns in HEAD, so this proposal seems trivial, but there is another patch [2] soon to be pushed, which will add more conflict count columns. As the number of columns increases IMO it becomes more important that each column is where you would intuitively expect to find it. ~ Changes would be needed in several places: - docs (doc/src/sgml/monitoring.sgml) - function pg_stat_get_subscription_stats (pg_proc.dat) - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql) - TAP test SELECTs (test/subscription/t/026_stats.pl) ~ Thoughts? == [1] docs - https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION-STATS [2] stats for conflicts - https://www.postgresql.org/message-id/flat/OS0PR01MB57160A07BD575773045FC214948F2%40OS0PR01MB5716.jpnprd01.prod.outlook.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Introduce XID age and inactive timeout based replication slot invalidation
test/recovery/t/050_invalidate_slots.pl ~~~ Please refer to the attached file which implements some of the nits mentioned above. == [1] v43 review - https://www.postgresql.org/message-id/CAHut%2BPuFzCHPCiZbpoQX59kgZbebuWT0gR0O7rOe4t_sdYu%3DOA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 970b496..0537714 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4564,8 +4564,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows -Invalidates replication slots that are inactive for longer than -specified amount of time. If this value is specified without units, +Invalidate replication slots that are inactive for longer than this +amount of time. If this value is specified without units, it is taken as seconds. A value of zero (which is default) disables the timeout mechanism. This parameter can only be set in the postgresql.conf file or on the server @@ -4573,11 +4573,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows -This invalidation check happens either when the slot is acquired -for use or during checkpoint. The time since the slot has become -inactive is known from its -inactive_since value using which the -timeout is measured. +Slot invalidation due to inactivity timeout occurs during checkpoint. +The duration of slot inactivity is calculated using the slot's +inactive_since field value. @@ -4585,9 +4583,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows applicable for slots on the standby server that are being synced from primary server (i.e., standby slots having synced field true). -Because such synced slots are typically considered not active -(for them to be later considered as inactive) as they don't perform -logical decoding to produce the changes. +Synced slots are always considered to be inactive because they don't +perform logical decoding to produce changes. diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index acc0370..bb06592 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -551,12 +551,11 @@ ReplicationSlotName(int index, Name name) * An error is raised if nowait is true and the slot is currently in use. If * nowait is false, we sleep until the slot is released by the owning process. * - * An error is raised if check_for_invalidation is true and the slot has been + * An error is raised if error_if_invalid is true and the slot has been * invalidated previously. */ void -ReplicationSlotAcquire(const char *name, bool nowait, - bool check_for_invalidation) +ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid) { ReplicationSlot *s; int active_pid; @@ -635,11 +634,10 @@ retry: MyReplicationSlot = s; /* -* Error out if the slot has been invalidated previously. Because there's -* no use in acquiring the invalidated slot. +* An error is raised if error_if_invalid is true and the slot has been +* invalidated previously. */ - if (check_for_invalidation && - s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT) + if (error_if_invalid && s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT) { Assert(s->inactive_since > 0); ereport(ERROR, @@ -1565,6 +1563,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause, _("You might need to increase \"%s\"."), "max_slot_wal_keep_size"); break; } + case RS_INVAL_HORIZON: appendStringInfo(&err_detail, _("The slot conflicted with xid horizon %u."), snapshotConflictHorizon); @@ -1573,6 +1572,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause, case RS_INVAL_WAL_LEVEL: appendStringInfoString(&err_detail, _("Logical decoding on standby requires \"wal_level\" >= \"logical\" on the primary server.")); break; + case RS_INVAL_INACTIVE_TIMEOUT: Assert(inactive_since > 0); appendStringInfo(&err_detail, @@ -1584,6 +15
DOCS - pg_replication_slot . Fix the 'inactive_since' description
Hi hackers. While reviewing another thread I had cause to look at the docs for the pg_replication_slot 'inactive_since' field [1] for the first time. I was confused by the description, which is as follows: inactive_since timestamptz The time since the slot has become inactive. NULL if the slot is currently being used. Then I found the github history for the patch [2], and the accompanying long thread discussion [3] about the renaming of that field. I have no intention to re-open that can-of-worms, but OTOH I feel the first sentence of the field description is wrong and needs fixing. Specifically, IMO describing something as "The time since..." means some amount of elapsed time since some occurrence, but that is not the correct description for this timestamp field. This is not just a case of me being pedantic. For example, here is what Chat-GPT had to say: I asked: What does "The time since the slot has become inactive." mean? ChatGPT said: "The time since the slot has become inactive" refers to the duration that has passed from the moment a specific slot (likely a database replication slot or a similar entity) stopped being active. In other words, it measures how much time has elapsed since the slot transitioned from an active state to an inactive state. For example, if a slot became inactive 2 hours ago, "the time since the slot has become inactive" would be 2 hours. To summarize, the current description wrongly describes the field as a time duration: "The time since the slot has become inactive." I suggest replacing it with: "The slot has been inactive since this time." The attached patch makes this suggested change. == [1] docs - https://www.postgresql.org/docs/devel/view-pg-replication-slots.html [2] thread - https://www.postgresql.org/message-id/ca+tgmob_ta-t2ty8qrkhbgnnlrf4zycwhghgfsuuofraedw...@mail.gmail.com [3] push - https://github.com/postgres/postgres/commit/6d49c8d4b4f4a20eb5b4c501d78cf894fa13c0ea Kind Regards, Peter Smith. Fujitsu Australia v1-0001-fix-description-for-inactive_since.patch Description: Binary data
Re: Collect statistics about conflicts in logical replication
On Fri, Aug 30, 2024 at 4:24 PM shveta malik wrote: > > On Fri, Aug 30, 2024 at 10:53 AM Peter Smith wrote: > > ... > > 2. Arrange all the counts into an intuitive/natural order > > > > There is an intuitive/natural ordering for these counts. For example, > > the 'confl_*' count fields are in the order insert -> update -> > > delete, which LGTM. > > > > Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not > > in a good order. > > > > IMO it makes more sense if everything is ordered as: > > 'sync_error_count', then 'apply_error_count', then all the 'confl_*' > > counts. > > > > This comment applies to lots of places, e.g.: > > - docs (doc/src/sgml/monitoring.sgml) > > - function pg_stat_get_subscription_stats (pg_proc.dat) > > - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql) > > - TAP test SELECTs (test/subscription/t/026_stats.pl) > > > > As all those places are already impacted by this patch, I think it > > would be good if (in passing) we (if possible) also swapped the > > sync/apply counts so everything is ordered intuitively top-to-bottom > > or left-to-right. > > Not sure about this though. It does not seem to belong to the current patch. > Fair enough. But, besides being inappropriate to include in the current patch, do you think the suggestion to reorder them made sense? If it has some merit, then I will propose it again as a separate thread. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Collect statistics about conflicts in logical replication
On Fri, Aug 30, 2024 at 6:36 PM shveta malik wrote: > > On Fri, Aug 30, 2024 at 12:15 PM Zhijie Hou (Fujitsu) > wrote: > > > > > > Here is V5 patch which addressed above and Shveta's[1] comments. > > > > The patch looks good to me. > Patch v5 LGTM. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Collect statistics about conflicts in logical replication
Hi Hou-San. Here are my review comments for v4-0001. == 1. Add links in the docs IMO it would be good for all these confl_* descriptions (in doc/src/sgml/monitoring.sgml) to include links back to where each of those conflict types was defined [1]. Indeed, when links are included to the original conflict type information then I think you should remove mentioning "track_commit_timestamp": + counted only when the + track_commit_timestamp + option is enabled on the subscriber. It should be obvious that you cannot count a conflict if the conflict does not happen, but I don't think we should scatter/duplicate those rules in different places saying when certain conflicts can/can't happen -- we should just link everywhere back to the original description for those rules. ~~~ 2. Arrange all the counts into an intuitive/natural order There is an intuitive/natural ordering for these counts. For example, the 'confl_*' count fields are in the order insert -> update -> delete, which LGTM. Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not in a good order. IMO it makes more sense if everything is ordered as: 'sync_error_count', then 'apply_error_count', then all the 'confl_*' counts. This comment applies to lots of places, e.g.: - docs (doc/src/sgml/monitoring.sgml) - function pg_stat_get_subscription_stats (pg_proc.dat) - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql) - TAP test SELECTs (test/subscription/t/026_stats.pl) As all those places are already impacted by this patch, I think it would be good if (in passing) we (if possible) also swapped the sync/apply counts so everything is ordered intuitively top-to-bottom or left-to-right. == [1] https://www.postgresql.org/docs/devel/logical-replication-conflicts.html#LOGICAL-REPLICATION-CONFLICTS Kind Regards, Peter Smith. Fujitsu Australia
Re: Introduce XID age and inactive timeout based replication slot invalidation
sign now = GetCurrentTimestamp(); here? ~ 11. + * Note that we don't invalidate synced slots because, + * they are typically considered not active as they don't + * perform logical decoding to produce the changes. nit - tweaked punctuation ~ 12. + * If the slot can be acquired, do so or if the slot is already ours, + * then mark it invalidated. Otherwise we'll signal the owning + * process, below, and retry. nit - tidied this comment. Suggestion: If the slot can be acquired, do so and mark it as invalidated. If the slot is already ours, mark it as invalidated. Otherwise, we'll signal the owning process below and retry. ~ 13. + if (active_pid == 0 || + (MyReplicationSlot != NULL && + MyReplicationSlot == s && + active_pid == MyProcPid)) You are already checking MyReplicationSlot == s here, so that extra check for MyReplicationSlot != NULL is redundant, isn't it? ~~~ 14. CheckPointReplicationSlots /* - * Flush all replication slots to disk. + * Flush all replication slots to disk. Also, invalidate slots during + * non-shutdown checkpoint. * * It is convenient to flush dirty replication slots at the time of checkpoint. * Additionally, in case of a shutdown checkpoint, we also identify the slots nit - /Also, invalidate slots/Also, invalidate obsolete slots/ == src/backend/utils/misc/guc_tables.c 15. + {"replication_slot_inactive_timeout", PGC_SIGHUP, REPLICATION_SENDING, + gettext_noop("Sets the amount of time to wait before invalidating an " + "inactive replication slot."), nit - that is maybe a bit misleading because IIUC there is no real "waiting" happening anywhere. Suggest: Sets the amount of time a replication slot can remain inactive before it will be invalidated. == Please take a look at the attached top-up patches. These include changes for many of the nits above. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 7009350..c96ae53 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -671,9 +671,10 @@ retry: (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("can no longer get changes from replication slot \"%s\"", NameStr(s->data.name)), -errdetail("This slot has been invalidated because it was inactive since %s for more than %d seconds specified by \"replication_slot_inactive_timeout\".", +errdetail("The slot became invalid because it was inactive since %s, which is more than %d seconds ago.", timestamptz_to_str(s->inactive_since), - replication_slot_inactive_timeout))); + replication_slot_inactive_timeout), +errhint("You might need to increase \"%s\".", "replication_slot_inactive_timeout"))); } } @@ -1738,9 +1739,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, * is disabled or slot is currently being used or the slot * on standby is currently being synced from the primary. * -* Note that we don't invalidate synced slots because, -* they are typically considered not active as they don't -* perform logical decoding to produce the changes. +* Note that we don't invalidate synced slots because +* they are typically considered not active, as they don't +* perform logical decoding to produce changes. */ if (replication_slot_inactive_timeout == 0 || s->inactive_since == 0 || @@ -1789,9 +1790,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, active_pid = s->active_pid; /* -* If the slot can be acquired, do so or if the slot is already ours, -* then mark it invalidated. Otherwise we'll signal the owning -* process, below, and retry. +* If the slot can be acquired, do so and mark it as invalidated. +* If the slot is already ours, mark it as inv
Re: Collect statistics about conflicts in logical replication
Hi Hou-San. I tried an experiment where I deliberately violated a primary key during initial table synchronization. For example: test_sub=# create table t1(a int primary key); CREATE TABLE test_sub=# insert into t1 values(1); INSERT 0 1 test_sub=# create subscription sub1 connection 'dbname=test_pub' publication pub1 with (enabled=false); 2024-08-29 09:53:21.172 AEST [24186] WARNING: subscriptions created by regression test cases should have names starting with "regress_" WARNING: subscriptions created by regression test cases should have names starting with "regress_" NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# select * from pg_stat_subscription_stats; subid | subname | apply_error_count | sync_error_count | insert_exists_count | update_differ_count | update_exists_count | update_missing_count | de lete_differ_count | delete_missing_count | stats_reset ---+-+---+--+-+-+-+--+--- --+--+- 16390 | sub1| 0 |0 | 0 | 0 | 0 | 0 | 0 |0 | (1 row) test_sub=# alter subscription sub1 enable; ALTER SUBSCRIPTION test_sub=# 2024-08-29 09:53:57.245 AEST [4345] LOG: logical replication apply worker for subscription "sub1" has started 2024-08-29 09:53:57.258 AEST [4347] LOG: logical replication table synchronization worker for subscription "sub1", table "t1" has started 2024-08-29 09:53:57.311 AEST [4347] ERROR: duplicate key value violates unique constraint "t1_pkey" 2024-08-29 09:53:57.311 AEST [4347] DETAIL: Key (a)=(1) already exists. 2024-08-29 09:53:57.311 AEST [4347] CONTEXT: COPY t1, line 1 2024-08-29 09:53:57.312 AEST [23501] LOG: background worker "logical replication tablesync worker" (PID 4347) exited with exit code 1 2024-08-29 09:54:02.385 AEST [4501] LOG: logical replication table synchronization worker for subscription "sub1", table "t1" has started 2024-08-29 09:54:02.462 AEST [4501] ERROR: duplicate key value violates unique constraint "t1_pkey" 2024-08-29 09:54:02.462 AEST [4501] DETAIL: Key (a)=(1) already exists. 2024-08-29 09:54:02.462 AEST [4501] CONTEXT: COPY t1, line 1 2024-08-29 09:54:02.463 AEST [23501] LOG: background worker "logical replication tablesync worker" (PID 4501) exited with exit code 1 2024-08-29 09:54:07.512 AEST [4654] LOG: logical replication table synchronization worker for subscription "sub1", table "t1" has started 2024-08-29 09:54:07.580 AEST [4654] ERROR: duplicate key value violates unique constraint "t1_pkey" 2024-08-29 09:54:07.580 AEST [4654] DETAIL: Key (a)=(1) already exists. 2024-08-29 09:54:07.580 AEST [4654] CONTEXT: COPY t1, line 1 ... test_sub=# alter subscription sub1 disable;' ALTER SUBSCRIPTION 2024-08-29 09:55:10.329 AEST [4345] LOG: logical replication worker for subscription "sub1" will stop because the subscription was disabled test_sub=# select * from pg_stat_subscription_stats; subid | subname | apply_error_count | sync_error_count | insert_exists_count | update_differ_count | update_exists_count | update_missing_count | de lete_differ_count | delete_missing_count | stats_reset ---+-+---+--+-+-+-+--+--- --+--+- 16390 | sub1| 0 | 15 | 0 | 0 | 0 | 0 | 0 |0 | (1 row) ~~~ Notice how after a while there were multiple (15) 'sync_error_count' recorded. According to the docs: 'insert_exists' happens when "Inserting a row that violates a NOT DEFERRABLE unique constraint.". So why are there not the same number of 'insert_exists_count' recorded in pg_stat_subscription_stats? The 'insert_exists' is either not happening or is not being counted during table synchronization. Either way, it was not what I was expecting. If it is not a bug, maybe the docs need to explain more about the rules for 'insert_exists' during the initial table sync. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Collect statistics about conflicts in logical replication
On Wed, Aug 28, 2024 at 9:19 PM Amit Kapila wrote: > > On Wed, Aug 28, 2024 at 11:43 AM shveta malik wrote: > > > > Thanks for the patch. Just thinking out loud, since we have names like > > 'apply_error_count', 'sync_error_count' which tells that they are > > actually error-count, will it be better to have something similar in > > conflict-count cases, like 'insert_exists_conflict_count', > > 'delete_missing_conflict_count' and so on. Thoughts? > > > > It would be better to have conflict in the names but OTOH it will make > the names a bit longer. The other alternatives could be (a) > insert_exists_confl_count, etc. (b) confl_insert_exists_count, etc. > (c) confl_insert_exists, etc. These are based on the column names in > the existing view pg_stat_database_conflicts [1]. The (c) looks better > than other options but it will make the conflict-related columns > different from error-related columns. > > Yet another option is to have a different view like > pg_stat_subscription_conflicts but that sounds like going too far. > > [1] - > https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-CONFLICTS-VIEW Option (c) looked good to me. Removing the suffix "_count" is OK. For example, try searching all of Chapter 27 ("The Cumulative Statistics System") [1] for columns described as "Number of ..." and you will find that a "_count" suffix is used only rarely. Adding the prefix "confl_" is OK. As mentioned, there is a precedent for this. See "pg_stat_database_conflicts" [2]. Mixing column names where some have and some do not have "_count" suffixes may not be ideal, but I see no problem because there are precedents for that too. E.g. see "pg_stat_replication_slots" [3], and "pg_stat_all_tables" [4]. == [1] https://www.postgresql.org/docs/devel/monitoring-stats.html [2] https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-CONFLICTS-VIEW [3] https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW [4] https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW Kind Regards, Peter Smith. Fujitsu Australia
Re: Conflict detection and logging in logical replication
On Wed, Aug 28, 2024 at 3:53 PM shveta malik wrote: > > On Wed, Aug 28, 2024 at 9:44 AM Zhijie Hou (Fujitsu) > wrote: > > > > > > +1 on 'update_origin_differs' instead of 'update_origins_differ' as > > > > the former is somewhat similar to other conflict names 'insert_exists' > > > > and 'update_exists'. > > > > > > Since we reached a consensus on this, I am attaching a small patch to > > > rename > > > as suggested. > > > > Sorry, I attached the wrong patch. Here is correct one. > > > > LGTM. > LGTM. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Collect statistics about conflicts in logical replication
On Mon, Aug 26, 2024 at 10:13 PM Zhijie Hou (Fujitsu) wrote: > > On Monday, August 26, 2024 3:30 PM Peter Smith wrote: > > > > == > > src/include/replication/conflict.h > > > > nit - defined 'NUM_CONFLICT_TYPES' inside the enum (I think this way is > > often used in other PG source enums) > > I think we have recently tended to avoid doing that, as it has been commented > that this style is somewhat deceptive and can cause confusion. See a previous > similar comment[1]. The current style follows the other existing examples > like: > > #define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1) > #define IOCONTEXT_NUM_TYPES (IOCONTEXT_VACUUM + 1) > #define IOOP_NUM_TYPES (IOOP_WRITEBACK + 1) > #define BACKEND_NUM_TYPES (B_LOGGER + 1) OK. > > > > == > > src/test/subscription/t/026_stats.pl > > > > 1. > > + # Delete data from the test table on the publisher. This delete > > + operation # should be skipped on the subscriber since the table is already > > empty. > > + $node_publisher->safe_psql($db, qq(DELETE FROM $table_name;)); > > + > > + # Wait for the subscriber to report tuple missing conflict. > > + $node_subscriber->poll_query_until( > > + $db, > > + qq[ > > + SELECT update_missing_count > 0 AND delete_missing_count > 0 FROM > > + pg_stat_subscription_stats WHERE subname = '$sub_name' > > + ]) > > + or die > > + qq(Timed out while waiting for tuple missing conflict for > > subscription '$sub_name'); > > > > Can you write a comment to explain why the replicated DELETE is > > expected to increment both the 'update_missing_count' and the > > 'delete_missing_count'? > > I think the comments several lines above the wait explained the reason[2]. I > slightly modified the comments to make it clear. > 1. Right, but it still was not obvious to me what caused the 'update_missing_count'. On further study, I see it was a hangover from the earlier UPDATE test case which was still stuck in an ERROR loop attempting to do the update operation. e.g. before it was giving the expected 'update_exists' conflicts but after the subscriber table TRUNCATE the update conflict changes to give a 'update_missing' conflict instead. I've updated the comment to reflect my understanding. Please have a look to see if you agree. 2. I separated the tests for 'update_missing' and 'delete_missing', putting the update_missing test *before* the DELETE. I felt the expected results were much clearer when each test did just one thing. Please have a look to see if you agree. ~~~ 3. +# Enable track_commit_timestamp to detect origin-differ conflicts in logical +# replication. Reduce wal_retrieve_retry_interval to 1ms to accelerate the +# restart of the logical replication worker after encountering a conflict. +$node_subscriber->append_conf( + 'postgresql.conf', q{ +track_commit_timestamp = on +wal_retrieve_retry_interval = 1ms +}); Later, after CDR resolvers are implemented, it might be good to revisit these conflict test cases and re-write them to use some conflict resolvers like 'skip'. Then the subscriber won't give ERRORs and restart apply workers all the time behind the scenes, so you won't need the above configuration for accelerating the worker restarts. In other words, running these tests might be more efficient if you can avoid restarting workers all the time. I suggest putting an XXX comment here as a reminder that these tests should be revisited to make use of conflict resolvers in the future. ~~~ nit - not caused by this patch, but other comment inconsistencies about "stats_reset timestamp" can be fixed in passing too. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl index 0df31a6..d9589f0 100644 --- a/src/test/subscription/t/026_stats.pl +++ b/src/test/subscription/t/026_stats.pl @@ -134,24 +134,37 @@ sub create_sub_pub_w_errors or die qq(Timed out while waiting for update_exists conflict for subscription '$sub_name'); - # Truncate the test table to ensure that the conflicting update operations - # are skipped, allowing the test to continue. - $node_subscriber->safe_psql($db, qq(TRUNCATE $table_name)); +# Truncate the subscriber side test table. Now that the table is empty, +# the update conflict ('update_existing') ERRORs will stop happening. A +# single update conflict 'update_missing' will be reported, but the update +# will be skipped on the subscriber, allowing the test to continue. +$node_subscriber->safe_psql($db, qq(TRUNCATE $table
Re: Conflict detection and logging in logical replication
On Mon, Aug 26, 2024 at 7:52 PM Amit Kapila wrote: > > On Thu, Aug 22, 2024 at 2:21 PM Amit Kapila wrote: > > > > On Thu, Aug 22, 2024 at 1:33 PM Peter Smith wrote: > > > > > > Do you think the documentation for the 'column_value' parameter of the > > > conflict logging should say that the displayed value might be > > > truncated? > > > > > > > I updated the patch to mention this and pushed it. > > > > Peter Smith mentioned to me off-list that the names of conflict types > 'update_differ' and 'delete_differ' are not intuitive as compared to > all other conflict types like insert_exists, update_missing, etc. The > other alternative that comes to mind for those conflicts is to name > them as 'update_origin_differ'/''delete_origin_differ'. > For things to "differ" there must be more than one them. The plural of origin is origins. e.g. 'update_origins_differ'/''delete_origins_differ'. OTOH, you could say "differs" instead of differ: e.g. 'update_origin_differs'/''delete_origin_differs'. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Collect statistics about conflicts in logical replication
Hi Hou-san. Here are some review comments for your patch v1-0001. == doc/src/sgml/logical-replication.sgml nit - added a comma. == doc/src/sgml/monitoring.sgml nit - use for 'apply_error_count'. nit - added a period when there are multiple sentences. nit - adjusted field descriptions using Chat-GPT clarification suggestions == src/include/pgstat.h nit - change the param to 'type' -- ie. same as the implementation calls it == src/include/replication/conflict.h nit - defined 'NUM_CONFLICT_TYPES' inside the enum (I think this way is often used in other PG source enums) == src/test/subscription/t/026_stats.pl 1. + # Delete data from the test table on the publisher. This delete operation + # should be skipped on the subscriber since the table is already empty. + $node_publisher->safe_psql($db, qq(DELETE FROM $table_name;)); + + # Wait for the subscriber to report tuple missing conflict. + $node_subscriber->poll_query_until( + $db, + qq[ + SELECT update_missing_count > 0 AND delete_missing_count > 0 + FROM pg_stat_subscription_stats + WHERE subname = '$sub_name' + ]) + or die + qq(Timed out while waiting for tuple missing conflict for subscription '$sub_name'); Can you write a comment to explain why the replicated DELETE is expected to increment both the 'update_missing_count' and the 'delete_missing_count'? ~ nit - update several "Apply and Sync errors..." comments that did not mention conflicts nit - tweak comments wording for update_differ and delete_differ nit - /both > 0/> 0/ nit - /both 0/0/ == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index f3e3641..f682369 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -1585,7 +1585,7 @@ test_sub=# SELECT * FROM t1 ORDER BY id; - Additional logging is triggered and the conflict statistics are collected (displayed in the + Additional logging is triggered, and the conflict statistics are collected (displayed in the pg_stat_subscription_stats view) in the following conflict cases: diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index ea36d46..ac3c773 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -2159,7 +2159,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage Number of times an error occurred while applying changes. Note that any conflict resulting in an apply error will be counted in both - apply_error_count and the corresponding conflict count. + apply_error_count and the corresponding conflict count. @@ -2179,8 +2179,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage Number of times a row insertion violated a - NOT DEFERRABLE unique constraint while applying - changes + NOT DEFERRABLE unique constraint during the + application of changes @@ -2189,11 +2189,11 @@ description | Waiting for a newly initialized WAL file to reach durable storage update_differ_count bigint - Number of times an update was performed on a row that was previously - modified by another source while applying changes. This conflict is + Number of times an update was applied to a row that had been previously + modified by another source during the application of changes. This conflict is counted only when the track_commit_timestamp - option is enabled on the subscriber + option is enabled on the subscriber. @@ -2202,9 +2202,9 @@ description | Waiting for a newly initialized WAL file to reach durable storage update_exists_count bigint - Number of times that the updated value of a row violated a - NOT DEFERRABLE unique constraint while applying - changes + Number of times that an updated row value violated a + NOT DEFERRABLE unique constraint during the + application of changes @@ -2213,8 +2213,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage update_missing_count bigint - Number of times that the tuple to be updated was not found while applying - changes + Number of times the tuple to be updated was not found during the + application of changes @@ -2223,11 +2223,11 @@ description | Waiting for a newly initialized WAL file to reach durable storage delete_differ_count bigint - Number of times a delete was performed on a row that was previously - modified by another source while applying changes. This conflict is - counted only when
Re: Conflict Detection and Resolution
On Thu, Aug 22, 2024 at 8:15 PM shveta malik wrote: > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond wrote: > > > > The patches have been rebased on the latest pgHead following the merge > > of the conflict detection patch [1]. > > Thanks for working on patches. > > Summarizing the issues which need some suggestions/thoughts. > > 1) > For subscription based resolvers, currently the syntax implemented is: > > 1a) > CREATE SUBSCRIPTION > CONNECTION PUBLICATION > CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > 1b) > ALTER SUBSCRIPTION CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > Earlier the syntax suggested in [1] was: > CREATE SUBSCRIPTION CONNECTION PUBLICATION > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > I think the currently implemented syntax is good as it has less > repetition, unless others think otherwise. > > ~~ > > 2) > For subscription based resolvers, do we need a RESET command to reset > resolvers to default? Any one of below or both? > > 2a) reset all at once: > ALTER SUBSCRIPTION RESET CONFLICT RESOLVERS > > 2b) reset one at a time: > ALTER SUBSCRIPTION RESET CONFLICT RESOLVER for 'conflict_type'; > > The issue I see here is, to implement 1a and 1b, we have introduced > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > introduce the 'RESOLVERS' keyword as well. But we can come up with > some alternative syntax if we plan to implement these. Thoughts? > Hi Shveta, I felt it would be better to keep the syntax similar to the existing INSERT ... ON CONFLICT [1]. I'd suggest a syntax like this: ... ON CONFLICT ['conflict_type'] DO { 'conflict_action' | DEFAULT } ~~~ e.g. To configure conflict resolvers for the SUBSCRIPTION: CREATE SUBSCRIPTION subname CONNECTION coninfo PUBLICATION pubname ON CONFLICT 'conflict_type1' DO 'conflict_action1', ON CONFLICT 'conflict_type2' DO 'conflict_action2'; Likewise, for ALTER: ALTER SUBSCRIPTION ON CONFLICT 'conflict_type1' DO 'conflict_action1', ON CONFLICT 'conflict_type2' DO 'conflict_action2'; To RESET all at once: ALTER SUBSCRIPTION ON CONFLICT DO DEFAULT; And, to RESET one at a time: ALTER SUBSCRIPTION ON CONFLICT 'conflict_type1' DO DEFAULT; ~~~ Although your list format "('conflict_type1' = 'conflict_action1', 'conflict_type2' = 'conflict_action2')" is clear and without repetition, I predict this terse style could end up being troublesome because it does not offer much flexibility for whatever the future might hold for CDR. e.g. ability to handle the conflict with a user-defined resolver e.g. ability to handle the conflict conditionally (e.g. with a WHERE clause...) e.g. ability to handle all conflicts with a common resolver etc. Advantages of my suggestion: - Close to existing SQL syntax - No loss of clarity by removing the word "RESOLVER" - No requirement for new keyword/s - The commands now read more like English - Offers more flexibility for any unknown future requirements - The setup (via create subscription) and the alter/reset all look the same. == [1] https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Hi Shubham, I have reviewed v28* and posted updated v29 versions of patches 0001 and 0002. If you are OK with these changes, the next task would be to pg_indent them, then rebase the remaining patches (0003 etc.) and include those with the next patchset version. // Patch v29-0001 changes: nit - Fixed typo in comments. nit - Removed an unnecessary format change for the unchanged send_relation_and_attrs declaration. // Patch v29-0002 changes: 1. Made fixes to address Vignesh's review comments [1]. 2. Added the missing test cases for tab_gen_to_gen, and tab_alter. 3. Multiple other modifications include: nit - Renamed the test database /test/test_igc_true/ because 'test' was too vague. nit - This patch does not need to change most of the existing 'tab1' test. So we should not be reformating the existing test code for no reason. nit - I added a summary comment to describe the test combinations nit - The "Testcase end" comments were unnecessary and prone to error, so I removed them. nit - Change comments /incremental sync/incremental replication/ nit - Added XXX notes about copy_data=false. These are reminders for to change code in later TAP patches nit - Rearranged test steps so the publisher does not do incremental INSERT until all initial sync tests are done nit - Added initial sync tests even if copy_data=false. This is for completeness - these will be handled in a later TAP patch nit - The table names are self-explanatory, so some of the test "messages" were simplified == [1] https://www.postgresql.org/message-id/CALDaNm31LZQfeR8Vv1qNCOREGffvZbgGDrTp%3D3h%3DEHiHTEO2pQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia v29-0001-Enable-support-for-include_generated_columns-opt.patch Description: Binary data v29-0002-Tap-tests-for-generated-columns.patch Description: Binary data
Re: Conflict detection and logging in logical replication
Hi Hou-san. I was experimenting with some conflict logging and found that large column values are truncated in the log DETAIL. E.g. Below I have a table where I inserted a 3000 character text value 'bigbigbig..." Then I caused a replication conflict. test_sub=# delete fr2024-08-22 17:50:17.181 AEST [14901] LOG: logical replication apply worker for subscription "sub1" has started 2024-08-22 17:50:17.193 AEST [14901] ERROR: conflict detected on relation "public.t1": conflict=insert_exists 2024-08-22 17:50:17.193 AEST [14901] DETAIL: Key already exists in unique index "t1_pkey", modified in transaction 780. Key (a)=(k3); existing local tuple (k3, bigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigb...); remote tuple (k3, this will clash). ~ Do you think the documentation for the 'column_value' parameter of the conflict logging should say that the displayed value might be truncated? == Kind Regards, Peter Smith. Fujitsu Australia
Re: CREATE SUBSCRIPTION - add missing test case
On Thu, Aug 22, 2024 at 8:54 AM Peter Smith wrote: > > On Wed, Aug 21, 2024 at 8:48 PM Amit Kapila wrote: > > > > On Fri, Aug 16, 2024 at 9:45 AM vignesh C wrote: > > > > > > On Thu, 15 Aug 2024 at 12:55, Peter Smith wrote: > > > > > > > > Hi Hackers, > > > > > > > > While reviewing another logical replication thread [1], I found an > > > > ERROR scenario that seems to be untested. > > > > > > > > TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is > > > > missing some expected column(s). > > > > > > > > Attached is a patch to add the missing test for this error message. > > > > > > I agree currently there is no test to hit this code. > > > > > > > I also don't see a test for this error condition. However, it is not > > clear to me how important is it to cover this error code path. This > > code has existed for a long time and I didn't notice any bugs related > > to this. There is a possibility that in the future we might break > > something because of a lack of this test but not sure if we want to > > cover every code path via tests as each additional test also has some > > cost. OTOH, If others think it is important or a good idea to have > > this test then I don't have any objection to the same. > > Yes, AFAIK there were no bugs related to this; The test was proposed > to prevent accidental future bugs. > > BACKGROUND > > Another pending feature thread (replication of generated columns) [1] > required many test combinations to confirm all the different expected > results which are otherwise easily accidentally broken without > noticing. This *current* thread test shares one of the same error > messages, which is how it was discovered missing in the first place. > > ~~~ > > PROPOSAL > > I think this is not the first time a logical replication test has been > questioned due mostly to concern about creeping "costs". > > How about we create a new test file and put test cases like this one > into it, guarded by code like the below using PG_TEST_EXTRA [2]? > > Doing it this way we can have better code coverage and higher > confidence when we want it, but zero test cost overheads when we don't > want it. > > e.g. > > src/test/subscription/t/101_extra.pl: > > if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bsubscription\b/) > { > plan skip_all => > 'Due to execution costs these tests are skipped unless subscription > is enabled in PG_TEST_EXTRA'; > } > > # Add tests here... > To help strengthen the above proposal, here are a couple of examples where TAP tests already use this strategy to avoid tests for various reasons. [1] Avoids some test because of cost # WAL consistency checking is resource intensive so require opt-in with the # PG_TEST_EXTRA environment variable. if ( $ENV{PG_TEST_EXTRA} && $ENV{PG_TEST_EXTRA} =~ m/\bwal_consistency_checking\b/) { $node_primary->append_conf('postgresql.conf', 'wal_consistency_checking = all'); } [2] Avoids some tests because of safety if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/) { plan skip_all => 'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA'; } == [1] https://github.com/postgres/postgres/blob/master/src/test/recovery/t/027_stream_regress.pl [2] https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/t/004_load_balance_dns.pl Kind Regards, Peter Smith. Fujitsu Australia
Re: Conflict detection and logging in logical replication
HI Hous-San,. Here is my review of the v20-0001 docs patch. 1. Restructure into sections > I think that's a good idea. But I preferred to do that in a separate > patch(maybe a third patch after the first and second are RFC), because AFAICS > we would need to adjust some existing docs which falls outside the scope of > the current patch. OK. I thought deferring it would only make extra work/churn, given you already know up-front that everything will require restructuring later anyway. ~~~ 2. Synopsis 2.1 synopsis wrapping. > I thought about this, but wrapping the sentence would cause the words > to be displayed in different lines after compiling. I think that's > inconsistent > with the real log which display the tuples in one line. IMO the readability of the format is the most important objective for the documentation. And, as you told Shveta, there is already a real example where people can see the newlines if they want to. nit - Anyway, FYI there is are newline rendering problems here in v20. Removed newlines to make all these optional parts appear on the same line. 2.2 other stuff nit - Add underscore to /detailed explanation/detailed_explanation/, to make it more obvious this is a replacement parameter nit - Added newline after for readabilty of the SGML file. ~~~ 3. Case of literals It's not apparent to me why the optional "Key" part should be uppercase in the LOG but other (equally important?) literals of other parts like "replica identity" are not. It seems inconsistent. ~~~ 4. LOG parts nit - IMO the "schema.tablename" and the "conflict_type" deserved to have separate listitems. nit - The "conflict_type" should have markup. ~~~ 5. DETAIL parts nit - added newline above this for readability of the SGML. nit - Add underscore to detailed_explanation, and rearrange wording to name the parameter up-front same as the other bullets do. nit - change the case /key/Key/ to match the synopsis. ~~~ 6. + + The replica identity section includes the replica + identity key values that used to search for the existing local tuple to + be updated or deleted. This may include the full tuple value if the local + relation is marked with REPLICA IDENTITY FULL. + It might be good to also provide a link for that REPLICA IDENTITY FULL. (I did this already in the attachment as an example) ~~~ 7. Replacement parameters - column_name, column_value I've included these for completeness. I think it is useful. BTW, the column names seem sometimes optional but I did not know the rules. It should be documented what makes these names be shown or not shown. ~~~ Please see the attachment which implements most of the items mentioned above. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index 3df791a..a3a0eae 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -1670,11 +1670,10 @@ test_sub=# SELECT * FROM t1 ORDER BY id; The log format for logical replication conflicts is as follows: LOG: conflict detected on relation "schemaname.tablename": conflict=conflict_type -DETAIL: detailed explanation. - -Key (column_name, ...)=(column_value, ...) -; existing local tuple (column_name, ...)=(column_value, ...); remote tuple (column_name, ...)=(column_value, ...); replica identity {(column_name, ...)=(column_value, ...) | full (column_value, ...)}. +DETAIL: detailed_explanation. +Key (column_name, ...)=(column_value, ...); existing local tuple (column_name, ...)=(column_value, ...); remote tuple (column_name, ...)=(column_value, ...); replica identity {(column_name, ...)=(column_value, ...) | full (column_value, ...)}. + The log provides the following information: @@ -1683,28 +1682,34 @@ DETAIL: detailed explanation. - The name of the local relation involved in the conflict and the conflict - type (e.g., insert_exists, - update_exists). + schemaname.tablename + identifies the local relation involved in the conflict. + + + + + conflict_type is the type of conflict that occurred + (e.g., insert_exists, update_exists). + DETAIL - The origin, transaction ID, and commit timestamp of the transaction that - modified the existing local tuple, if available, are included in the - detailed explanation. + detailed_explanation includes + the origin, transaction ID, and commit timestamp of the transaction that + modified the existing local tuple, if available. - The key secti
Re: CREATE SUBSCRIPTION - add missing test case
On Wed, Aug 21, 2024 at 8:48 PM Amit Kapila wrote: > > On Fri, Aug 16, 2024 at 9:45 AM vignesh C wrote: > > > > On Thu, 15 Aug 2024 at 12:55, Peter Smith wrote: > > > > > > Hi Hackers, > > > > > > While reviewing another logical replication thread [1], I found an > > > ERROR scenario that seems to be untested. > > > > > > TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is > > > missing some expected column(s). > > > > > > Attached is a patch to add the missing test for this error message. > > > > I agree currently there is no test to hit this code. > > > > I also don't see a test for this error condition. However, it is not > clear to me how important is it to cover this error code path. This > code has existed for a long time and I didn't notice any bugs related > to this. There is a possibility that in the future we might break > something because of a lack of this test but not sure if we want to > cover every code path via tests as each additional test also has some > cost. OTOH, If others think it is important or a good idea to have > this test then I don't have any objection to the same. Yes, AFAIK there were no bugs related to this; The test was proposed to prevent accidental future bugs. BACKGROUND Another pending feature thread (replication of generated columns) [1] required many test combinations to confirm all the different expected results which are otherwise easily accidentally broken without noticing. This *current* thread test shares one of the same error messages, which is how it was discovered missing in the first place. ~~~ PROPOSAL I think this is not the first time a logical replication test has been questioned due mostly to concern about creeping "costs". How about we create a new test file and put test cases like this one into it, guarded by code like the below using PG_TEST_EXTRA [2]? Doing it this way we can have better code coverage and higher confidence when we want it, but zero test cost overheads when we don't want it. e.g. src/test/subscription/t/101_extra.pl: if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bsubscription\b/) { plan skip_all => 'Due to execution costs these tests are skipped unless subscription is enabled in PG_TEST_EXTRA'; } # Add tests here... == [1] https://www.postgresql.org/message-id/flat/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E%40gmail.com [2] https://www.postgresql.org/docs/devel/regress-run.html#REGRESS-ADDITIONAL Kind Regards, Peter Smith. Fujitsu Australia
Re: Conflict detection and logging in logical replication
Here are some review comments for the v19-0001 docs patch. The content seemed reasonable, but IMO it should be presented quite differently. 1. Use sub-sections I expect this logical replication "Conflicts" section is going to evolve into something much bigger. Surely, it's not going to be one humongous page of details, so it will be a section with lots of subsections like all the other in Chapter 29. IMO, you should be writing the docs in that kind of structure from the beginning. For example, I'm thinking something like below (this is just an example - surely lots more subsections will be needed for this topic): 29.6 Conflicts 29.6.1. Conflict types 29.6.2. Logging format 29.6.3. Examples Specifically, this v19-0001 patch information should be put into a subsection like the 29.6.2 shown above. ~~~ 2. Markup + +LOG: conflict detected on relation "schemaname.tablename": conflict=conflict_type +DETAIL: detailed explaination. +Key (column_name, ...)=(column_name, ...); existing local tuple (column_name, ...)=(column_name, ...); remote tuple (column_name, ...)=(column_name, ...); replica identity (column_name, ...)=(column_name, ...). + IMO this should be using markup more like the SQL syntax references. - e.g. I suggest instead of - e.g. I suggest all the substitution parameters (e.g. detailed explanation, conflict_type, column_name, ...) in the log should use and use those markups again later in these docs instead of ~ nit - typo /explaination/explanation/ ~ nit - The amount of scrolling needed makes this LOG format too hard to see. Try to wrap it better so it can fit without being so wide. ~~~ 3. Restructure the list + + I suggest restructuring all this to use a nested list like: LOG - conflict_type DETAIL - detailed_explanation - key - existing_local_tuple - remote_tuple - replica_identity Doing this means you can remove a great deal of the unnecessary junk words like "of the first sentence in the DETAIL", and "sentence of the DETAIL line" etc. The result will be much less text but much simpler text too. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
Hi Vignesh, Here are my only review comments for the latest patch set. v20240820-0003. nit - missing period for comment in FetchRelationStates nit - typo in function name 'ProcessSyncingTablesFoSync' == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/logical/syncutils.c b/src/backend/replication/logical/syncutils.c index b8f9300..705a330 100644 --- a/src/backend/replication/logical/syncutils.c +++ b/src/backend/replication/logical/syncutils.c @@ -96,7 +96,7 @@ SyncProcessRelations(XLogRecPtr current_lsn) break; case WORKERTYPE_TABLESYNC: - ProcessSyncingTablesFoSync(current_lsn); + ProcessSyncingTablesForSync(current_lsn); break; case WORKERTYPE_APPLY: @@ -143,7 +143,7 @@ FetchRelationStates(bool *started_tx) *started_tx = true; } - /* Fetch tables that are in non-ready state */ + /* Fetch tables that are in non-ready state. */ rstates = GetSubscriptionRelations(MySubscription->oid, true); /* Allocate the tracking info in a permanent memory context. */ diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index ad92b84..c753f45 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -237,7 +237,7 @@ wait_for_worker_state_change(char expected_state) * SYNCDONE and finish. */ void -ProcessSyncingTablesFoSync(XLogRecPtr current_lsn) +ProcessSyncingTablesForSync(XLogRecPtr current_lsn) { SpinLockAcquire(&MyLogicalRepWorker->relmutex); diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h index 24f74ab..6504b70 100644 --- a/src/include/replication/worker_internal.h +++ b/src/include/replication/worker_internal.h @@ -264,7 +264,7 @@ extern void UpdateTwoPhaseState(Oid suboid, char new_state); extern bool FetchRelationStates(bool *started_tx); extern bool WaitForRelationStateChange(Oid relid, char expected_state); -extern void ProcessSyncingTablesFoSync(XLogRecPtr current_lsn); +extern void ProcessSyncingTablesForSync(XLogRecPtr current_lsn); extern void ProcessSyncingTablesForApply(XLogRecPtr current_lsn); extern void SyncProcessRelations(XLogRecPtr current_lsn); extern void SyncInvalidateRelationStates(Datum arg, int cacheid,
Re: GUC names in messages
CFBot reported some failures, so I have attached the rebased patch set v9*. I'm hopeful the majority of these might be pushed to avoid more rebasing... == Kind Regards, Peter Smith. Fujitsu Australia v9-0001-Add-quotes-for-GUCs-bool.patch Description: Binary data v9-0005-Add-quotes-for-GUCs-enum.patch Description: Binary data v9-0003-Add-quotes-for-GUCs-real.patch Description: Binary data v9-0004-Add-quotes-for-GUCs-string.patch Description: Binary data v9-0002-Add-quotes-for-GUCs-int.patch Description: Binary data v9-0008-GUC-names-make-common-translatable-messages.patch Description: Binary data v9-0006-GUC-names-fix-case-intervalstyle.patch Description: Binary data v9-0007-GUC-names-fix-case-datestyle.patch Description: Binary data
Re: PG docs - Sequence CYCLE clause
On Tue, Aug 20, 2024 at 10:18 AM Bruce Momjian wrote: > > Great, patch applied to master. > Thanks for pushing! == Kind Regards, Peter Smith. Fujitsu Australia
Re: CREATE SUBSCRIPTION - add missing test case
On Fri, Aug 16, 2024 at 2:15 PM vignesh C wrote: > Thanks for the review. > > I agree currently there is no test to hit this code. I'm not sure if > this is the correct location for the test, should it be included in > the 008_diff_schema.pl file? Yes, that is a better home for this test. Done as suggested in attached patch v2. == Kind Regards, Peter Smith. Fujitsu Australia v2-0001-Add-missing-test-case.patch Description: Binary data
Re: Logical Replication of sequences
Hi Vignesh, Here are my review comments for the latest patchset v20240819-0001. No changes. No comments. v20240819-0002. No changes. No comments. v20240819-0003. See below. v20240819-0004. See below. v20240819-0005. No changes. No comments. /// PATCH v20240819-0003 == src/backend/replication/logical/syncutils.c 3.1. +typedef enum +{ + SYNC_RELATION_STATE_NEEDS_REBUILD, + SYNC_RELATION_STATE_REBUILD_STARTED, + SYNC_RELATION_STATE_VALID, +} SyncingRelationsState; + +static SyncingRelationsState relation_states_validity = SYNC_RELATION_STATE_NEEDS_REBUILD; There is some muddle of singular/plural names here. The typedef/values/var should all match: e.g. It could be like: SYNC_RELATION_STATE_xxx --> SYNC_RELATION_STATES_xxx SyncingRelationsState --> SyncRelationStates But, a more radical change might be better. typedef enum { RELATION_STATES_SYNC_NEEDED, RELATION_STATES_SYNC_STARTED, RELATION_STATES_SYNCED, } SyncRelationStates; ~~~ 3.2. GENERAL refactoring I don't think all of the functions moved into syncutil.c truly belong there. This new module was introduced to be for common/util functions for tablesync and sequencesync, but with each patchset, it has been sucking in more and more functions that maybe do not quite belong here. For example, AFAIK these below have logic that is *solely* for TABLES (not for SEQUENCES). Perhaps it was convenient to dump them here because they are statically called, but I felt they still logically belong in tablesync.c: - process_syncing_tables_for_sync(XLogRecPtr current_lsn) - process_syncing_tables_for_apply(XLogRecPtr current_lsn) - AllTablesyncsReady(void) ~~~ 3.3. +static bool +FetchRelationStates(bool *started_tx) +{ If this function can remain static then the name should change to be like fetch_table_states, right? == src/include/replication/worker_internal.h 3.4. +extern bool wait_for_relation_state_change(Oid relid, char expected_state); If this previously static function will be exposed now (it may not need to be if some other functions are returned tablesync.c) then the function name should also be changed, right? PATCH v20240819-0004 == src/backend/replication/logical/syncutils.c 4.1 GENERAL refactoring (this is similar to review comment #3.2 above) Functions like below have logic that is *solely* for SEQUENCES (not for TABLES). I felt they logically belong in sequencesync.c, not here. - process_syncing_sequences_for_apply(void) ~~~ FetchRelationStates: nit - the comment change about "not-READY tables" (instead of relations) should be already in patch 0003. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Hi Shubham, here are my review comments for the TAP tests patch v27-0002 == Commit message Tap tests for 'include-generated-columns' ~ But, it's more than that-- these are the TAP tests for all combinations of replication related to generated columns. i.e. both with and without 'include_generated_columns' option enabled. == src/test/subscription/t/011_generated.pl I was mistaken, thinking that the v27-0002 had already been refactored according to Vignesh's last review but it is not done yet, so I am not going to post detailed review comments until the restructuring is completed. ~ OTOH, there are some problems I felt have crept into v26-0001 (TAP test is same as v27-0002), so maybe try to also take care of them (see below) in v28-0002. In no particular order: * I felt it is almost useless now to have the "combo" ( "regress_pub_combo") publication. It used to have many tables when you first created it but with every version posted it is publishing less and less so now there are only 2 tables in it. Better to have a specific publication for each table now and forget about "combos" * The "TEST tab_gen_to_gen initial sync" seems to be not even checking the table data. Why not? e.g. Even if you expect no data, you should test for it. * The "TEST tab_gen_to_gen replication" seems to be not even checking the table data. Why not? * Multiple XXX comments like "... it needs more study to determine if the above result was actually correct, or a PG17 bug..." should be removed. AFAIK we should well understand the expected results for all combinations by now. * The "TEST tab_order replication" is now getting an error saying , Now, that may now be the correct error for this situation, but in that case, then I think the test is not longer testing what it was intended to test (i.e. that column order does not matter) Probably the table definition needs adjusting to make sure we are testing whenwe want to test, and not just making some random scenario "PASS". * The test "# TEST tab_alter" expected empty result also seems unhelpful. It might be related to the previous bullet. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Hi, Here are my review comments for v27-0001. == contrib/test_decoding/expected/generated_columns.out contrib/test_decoding/sql/generated_columns.sql +-- By default, 'include-generated-columns' is enabled, so the values for the generated column 'b' will be replicated even if it is not explicitly specified. nit - The "default" is only like this for "test_decoding" (e.g., the CREATE SUBSCRIPTION option is the opposite), so let's make the comment clearer about that. nit - Use sentence case in the comments. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/contrib/test_decoding/expected/generated_columns.out b/contrib/test_decoding/expected/generated_columns.out index 48f900f..9b03f6d 100644 --- a/contrib/test_decoding/expected/generated_columns.out +++ b/contrib/test_decoding/expected/generated_columns.out @@ -1,13 +1,14 @@ --- test decoding of generated columns +-- Test decoding of generated columns. SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding'); ?column? -- init (1 row) --- column b' is a generated column +-- Column b' is a generated column. CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED); --- By default, 'include-generated-columns' is enabled, so the values for the generated column 'b' will be replicated even if it is not explicitly specified. +-- For 'test_decoding' the parameter 'include-generated-columns' is enabled by default, +-- so the values for the generated column 'b' will be replicated even if it is not explicitly specified. INSERT INTO gencoltable (a) VALUES (1); SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); data @@ -17,7 +18,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc COMMIT (3 rows) --- when 'include-generated-columns' is enabled, the values of the generated column 'b' will be replicated. +-- When 'include-generated-columns' is enabled, the values of the generated column 'b' will be replicated. INSERT INTO gencoltable (a) VALUES (2); SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1'); data @@ -27,7 +28,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc COMMIT (3 rows) --- when 'include-generated-columns' is disabled, the values of the generated column 'b' will not be replicated. +-- When 'include-generated-columns' is disabled, the values of the generated column 'b' will not be replicated. INSERT INTO gencoltable (a) VALUES (3); SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '0'); data @@ -37,7 +38,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc COMMIT (3 rows) --- with REPLICA IDENTITY = FULL, to show old-key data includes generated columns data for updates. +-- When REPLICA IDENTITY = FULL, show old-key data includes generated columns data for updates. ALTER TABLE gencoltable REPLICA IDENTITY FULL; UPDATE gencoltable SET a = 10; SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1'); diff --git a/contrib/test_decoding/sql/generated_columns.sql b/contrib/test_decoding/sql/generated_columns.sql index fb156c2..7b455a1 100644 --- a/contrib/test_decoding/sql/generated_columns.sql +++ b/contrib/test_decoding/sql/generated_columns.sql @@ -1,23 +1,24 @@ --- test decoding of generated columns +-- Test decoding of generated columns. SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding'); --- column b' is a generated column +-- Column b' is a generated column. CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED); --- By default, 'include-generated-columns' is enabled, so the values for the generated column 'b' will be replicated even if it is not explicitly specified. +-- For 'test_decoding' the parameter 'include-generated-columns' is
Re: Logical Replication of sequences
Here are my review comments for the latest patchset v20240817-0001. No changes. No comments. v20240817-0002. No changes. No comments. v20240817-0003. See below. v20240817-0004. See below. v20240817-0005. No changes. No comments. // v20240817-0003 and 0004. (This is a repeat of the same comment as in previous reviews, but lots more functions seem affected now) IIUC, the LR code tries to follow function naming conventions (e.g. CamelCase/snake_case for exposed/static functions respectively), intended to make the code more readable. But, this only works if the conventions are followed. Now, patches 0003 and 0004 are shuffling more and more functions between modules while changing them from static to non-static (or vice versa). So, the function name conventions are being violated many times. IMO these functions ought to be renamed according to their new modifiers to avoid the confusion caused by ignoring the name conventions. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
Hi Vignesh. I looked at the latest v20240815* patch set. I have only the following few comments for patch v20240815-0004, below. == Commit message. Please see the attachment for some suggested updates. == src/backend/commands/subscriptioncmds.c CreateSubscription: nit - fix wording in one of the XXX comments == .../replication/logical/sequencesync.c report_mismatched_sequences: nit - param name /warning_sequences/mismatched_seqs/ append_mismatched_sequences: nit - param name /warning_sequences/mismatched_seqs/ LogicalRepSyncSequences: nit - var name /warning_sequences/mismatched_seqs/ == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 534d385..e799a41 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -800,9 +800,9 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, * export it. * * XXX: If the subscription is for a sequence-only publication, -* creating this slot. It can be created later during the ALTER -* SUBSCRIPTION ... REFRESH command, if the publication is updated -* to include tables or tables in schema. +* creating this slot is unnecessary. It can be created later +* during the ALTER SUBSCRIPTION ... REFRESH command, if the +* publication is updated to include tables or tables in schema. */ if (opts.create_slot) { diff --git a/src/backend/replication/logical/sequencesync.c b/src/backend/replication/logical/sequencesync.c index 1aa5ab8..934646f 100644 --- a/src/backend/replication/logical/sequencesync.c +++ b/src/backend/replication/logical/sequencesync.c @@ -264,17 +264,17 @@ copy_sequence(WalReceiverConn *conn, Relation rel, * Report any sequence mismatches as a single warning log. */ static void -report_mismatched_sequences(StringInfo warning_sequences) +report_mismatched_sequences(StringInfo mismatched_seqs) { - if (warning_sequences->len) + if (mismatched_seqs->len) { ereport(WARNING, errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("parameters differ for the remote and local sequences (%s) for subscription \"%s\"", - warning_sequences->data, MySubscription->name), + mismatched_seqs->data, MySubscription->name), errhint("Alter/Re-create local sequences to have the same parameters as the remote sequences.")); - resetStringInfo(warning_sequences); + resetStringInfo(mismatched_seqs); } } @@ -282,15 +282,15 @@ report_mismatched_sequences(StringInfo warning_sequences) * append_mismatched_sequences * * Appends details of sequences that have discrepancies between the publisher - * and subscriber to the warning_sequences string. + * and subscriber to the mismatched_seqs string. */ static void -append_mismatched_sequences(StringInfo warning_sequences, Relation seqrel) +append_mismatched_sequences(StringInfo mismatched_seqs, Relation seqrel) { - if (warning_sequences->len) - appendStringInfoString(warning_sequences, ", "); + if (mismatched_seqs->len) + appendStringInfoString(mismatched_seqs, ", "); - appendStringInfo(warning_sequences, "\"%s.%s\"", + appendStringInfo(mismatched_seqs, "\"%s.%s\"", get_namespace_name(RelationGetNamespace(seqrel)), RelationGetRelationName(seqrel)); } @@ -314,7 +314,7 @@ LogicalRepSyncSequences(void) boolstart_txn = true; Oid subid = MyLogicalRepWorker->subid; MemoryContext oldctx; - StringInfo warning_sequences = makeStringInfo(); + StringInfo mismatched_seqs = makeStringInfo(); /* * Synchronizing each sequence individually incurs overhead from starting @@ -425,15 +425,15 @@ LogicalRepSyncSequences(void) PG_CATCH(); { if (sequence_mismatch) - append_mismatched_sequences(warning_sequences, sequence_rel); + append_mismatched_sequences(mismatched_seqs, sequence_rel); - report_mismatched_sequences(warning_sequences); + report_mismatched_sequences(mismatched_seqs);
Re: Pgoutput not capturing the generated columns
On Tue, Jul 23, 2024 at 9:23 AM Peter Smith wrote: > > On Fri, Jul 19, 2024 at 4:01 PM Shlok Kyal wrote: > > > > On Thu, 18 Jul 2024 at 13:55, Peter Smith wrote: > > > > > > Hi, here are some review comments for v19-0002 > > > == > > > src/test/subscription/t/004_sync.pl > > > > > > 1. > > > This new test is not related to generated columns. IIRC, this is just > > > some test that we discovered missing during review of this thread. As > > > such, I think this change can be posted/patched separately from this > > > thread. > > > > > I have removed the test for this thread. > > > > I have also addressed the remaining comments for v19-0002 patch. > > Hi, I have no more review comments for patch v20-0002 at this time. > > I saw that the above test was removed from this thread as suggested, > but I could not find that any new thread was started to propose this > valuable missing test. > I still did not find any new thread for adding the missing test case, so I started one myself [1]. == [1] https://www.postgresql.org/message-id/CAHut+PtX8P0EGhsk9p=hqguhrzxecszanxsmkovyilx-ejd...@mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
CREATE SUBSCRIPTION - add missing test case
Hi Hackers, While reviewing another logical replication thread [1], I found an ERROR scenario that seems to be untested. TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is missing some expected column(s). Attached is a patch to add the missing test for this error message. == [1] https://www.postgresql.org/message-id/CAHut%2BPt5FqV7J9GnnWFRNW_Z1KOMMAZXNTRcRNdtFrfMBz_GLA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia v1-0001-Add-missing-test-case.patch Description: Binary data
Re: Logical Replication of sequences
Hi Vignesh, I have reviewed your latest patchset: v20240814-0001. No comments v20240814-0002. No comments v20240814-0003. No comments v20240814-0004. See below v20240814-0005. No comments // v20240814-0004. == src/backend/commands/subscriptioncmds.c CreateSubscription: nit - XXX comments AlterSubscription_refresh: nit - unnecessary parens in ereport AlterSubscription: nit - unnecessary parens in ereport fetch_sequence_list: nit - unnecessary parens in ereport == .../replication/logical/sequencesync.c 1. fetch_remote_sequence_data + * Returns: + * - TRUE if there are discrepancies between the sequence parameters in + * the publisher and subscriber. + * - FALSE if the parameters match. + */ +static bool +fetch_remote_sequence_data(WalReceiverConn *conn, Oid relid, Oid remoteid, +char *nspname, char *relname, int64 *log_cnt, +bool *is_called, XLogRecPtr *page_lsn, +int64 *last_value) IMO it is more natural to return TRUE for good results and FALSE for bad ones. (FYI, I have implemented this reversal in the nitpicks attachment). ~ nit - swapped columns seqmin and seqmax in the SQL to fetch them in the natural order nit - unnecessary parens in ereport ~~~ copy_sequence: nit - update function comment to document the output parameter nit - Assert that *sequence_mismatch is false on entry to this function nit - tweak wrapping and add \n in the SQL nit - unnecessary parens in ereport report_sequence_mismatch: nit - modify function comment nit - function name changed /report_sequence_mismatch/report_mismatched_sequences/ (now plural (and more like the other one) append_mismatched_sequences: nit - param name /rel/seqrel/ ~~~ 2. LogicalRepSyncSequences: + Relation sequence_rel; + XLogRecPtr sequence_lsn; + bool sequence_mismatch; The 'sequence_mismatch' variable must be initialized false, otherwise we cannot trust it gets assigned. ~ LogicalRepSyncSequences: nit - unnecessary parens in ereport nit - move the for-loop variable declaration nit - remove a blank line process_syncing_sequences_for_apply: nit - variable declaration indent == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 9fff288..22115bd 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -726,10 +726,10 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, recordDependencyOnOwner(SubscriptionRelationId, subid, owner); /* -* XXX: If the subscription is for a sequence-only publication, -* creating this origin is unnecessary at this point. It can be created -* later during the ALTER SUBSCRIPTION ... REFRESH command, if the -* publication is updated to include tables or tables in schemas. +* XXX: If the subscription is for a sequence-only publication, creating +* this origin is unnecessary. It can be created later during the ALTER +* SUBSCRIPTION ... REFRESH command, if the publication is updated to +* include tables or tables in schemas. */ ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname, sizeof(originname)); replorigin_create(originname); @@ -800,9 +800,9 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, * export it. * * XXX: If the subscription is for a sequence-only publication, -* creating this slot is not necessary at the moment. It can be -* created during the ALTER SUBSCRIPTION ... REFRESH command if the -* publication is updated to include tables or tables in schema. +* creating this slot. It can be created later during the ALTER +* SUBSCRIPTION ... REFRESH command, if the publication is updated +* to include tables or tables in schema. */ if (opts.create_slot) { @@ -1021,9 +1021,9 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data, copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY, InvalidXLogRecPtr, true); ereport(DEBUG1, - (errmsg_internal("%s \"%s.%s\" added to subscription \"%s\"", - relkind == RELKIND_SEQUENCE ? "sequence" : "table", - rv->schemaname, rv->relname, sub->name))); +
Re: Logical Replication of sequences
Hi Vignesh, Here are my review comments for the latest patchset: Patch v20240813-0001. No comments Patch v20240813-0002. No comments Patch v20240813-0003. No comments Patch v20240813-0004. See below Patch v20240813-0005. No comments // Patch v20240813-0004 == src/backend/catalog/pg_subscription. GetSubscriptionRelations: nit - modify a condition for readability == src/backend/commands/subscriptioncmds.c fetch_sequence_list: nit - changed the WARNING message. /parameters differ between.../parameters differ for.../ (FYI, Chat-GPT agrees that 2nd way is more correct) nit - other minor changes to the message and hint == .../replication/logical/sequencesync.c 1. LogicalRepSyncSequences + ereport(DEBUG1, + errmsg("logical replication synchronization for subscription \"%s\", sequence \"%s\" has finished", +get_subscription_name(subid, false), get_rel_name(done_seq->relid))); DEBUG logs should use errmsg_internal. (fixed also nitpicks attachment). ~ nit - minor change to the log message counting the batched sequences ~~~ process_syncing_sequences_for_apply: nit - /sequence sync worker/seqeuencesync worker/ == src/backend/utils/misc/guc_tables.c nit - /max workers/maximum number of workers/ (for consistency because all other GUCs are verbose like this; nothing just says "max".) == src/test/subscription/t/034_sequences.pl nit - adjust the expected WARNING message (which was modified above) == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c index d938e57..af2bfe1 100644 --- a/src/backend/catalog/pg_subscription.c +++ b/src/backend/catalog/pg_subscription.c @@ -565,9 +565,11 @@ GetSubscriptionRelations(Oid subid, bool get_tables, bool get_sequences, relkind = get_rel_relkind(subrel->srrelid); /* Skip sequences if they were not requested */ - if ((relkind == RELKIND_SEQUENCE && !get_sequences) || - /* Skip tables if they were not requested */ - (relkind != RELKIND_SEQUENCE && !get_tables)) + if (relkind == RELKIND_SEQUENCE && !get_sequences) + continue; + + /* Skip tables if they were not requested */ + if (relkind != RELKIND_SEQUENCE && !get_tables) continue; relstate = (SubscriptionRelState *) palloc(sizeof(SubscriptionRelState)); diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 4166210..7d0be40 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -2577,9 +2577,9 @@ fetch_sequence_list(WalReceiverConn *wrconn, char *subname, List *publications) */ ereport(WARNING, errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("parameters differ between the remote and local sequences %s for subscription \"%s\"", + errmsg("parameters differ for the remote and local sequences (%s) for subscription \"%s\"", warning_sequences->data, subname), - errhint("Alter/Re-create the sequence using the same parameter as in remote.")); + errhint("Alter/Re-create local sequences to have the same parameters as the remote sequences.")); pfree(warning_sequences); } diff --git a/src/backend/replication/logical/sequencesync.c b/src/backend/replication/logical/sequencesync.c index aaedba9..e2e0421 100644 --- a/src/backend/replication/logical/sequencesync.c +++ b/src/backend/replication/logical/sequencesync.c @@ -342,12 +342,12 @@ LogicalRepSyncSequences(void) done_seq = (SubscriptionRelState *) lfirst(list_nth_cell(sequences_not_synced, i)); ereport(DEBUG1, - errmsg("logical replication synchronization for subscription \"%s\", sequence \"%s\" has finished", + errmsg_internal("logical replication synchronization for subscription \"%s\", sequence \"%s\" has finished", get_subscription_name(subid, false), get_rel_name(done_seq->relid))); } ereport(LOG, - errmsg("logical replication synchronized %d sequences of %d sequences for subscription \"%s\" ", + errmsg("logical replication synch
Re: Logical Replication of sequences
On Tue, Aug 13, 2024 at 10:00 PM vignesh C wrote: > > On Tue, 13 Aug 2024 at 09:19, Peter Smith wrote: > > > > 3.1. GENERAL > > > > Hmm. I am guessing this was provided as a separate patch to aid review > > by showing that existing functions are moved? OTOH you can't really > > judge this patch properly without already knowing details of what will > > come next in the sequencesync. i.e. As a *standalone* patch without > > the sequencesync.c the refactoring doesn't make much sense. > > > > Maybe it is OK later to combine patches 0003 and 0004. Alternatively, > > keep this patch separated but give greater emphasis in the comment > > header to say this patch only exists separately in order to help the > > review. > > I have kept this patch only to show that this patch as such has no > code changes. If we move this to the next patch it will be difficult > for reviewers to know which is new code and which is old code. During > commit we can merge this with the next one. I felt it is better to add > it in the commit message instead of comment header so updated the > commit message. > Yes, I wrote "comment header" but it was a typo; I meant "commit header". What you did looks good now. Thanks. > > ~ > > > > 3.4. function names > > > > With the re-shuffling that this patch does, and changing several from > > static to not-static, should the function names remain as they are? > > They look random to me. > > - finish_sync_worker(void) > > - invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue) > > - FetchTableStates(bool *started_tx) > > - process_syncing_tables(XLogRecPtr current_lsn) > > > > I think using a consistent naming convention would be better. e.g. > > SyncFinishWorker > > SyncInvalidateTableStates > > SyncFetchTableStates > > SyncProcessTables > > One advantage with keeping the existing names the same wherever > possible will help while merging the changes to back-branches. So I'm > not making this change. > According to my understanding, the logical replication code tries to maintain name conventions for static functions (snake_case) and for non-static functions (CamelCase) as an aid for code readability. I think we should either do our best to abide by those conventions, or we might as well just forget them and have a naming free-for-all. Since the new syncutils.c module is being introduced by this patch, my guess is that any future merging to back-branches will be affected regardless. IMO this is an ideal opportunity to try to nudge the function names in the right direction. YMMV. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
" has finished 2024-08-13 16:41:47.436 AEST [11735] LOG: logical replication synchronization for subscription "sub3", sequence "seq_0566" has finished 2024-08-13 16:41:47.436 AEST [11735] LOG: logical replication synchronization for subscription "sub3", sequence "seq_0568" has finished 2024-08-13 16:41:47.436 AEST [11735] LOG: logical replication synchronization for subscription "sub3", sequence "seq_0569" has finished 2024-08-13 16:41:47.436 AEST [11735] LOG: logical replication synchronization for subscription "sub3", sequence "seq_0570" has finished 2024-08-13 16:41:47.436 AEST [11735] LOG: logical replication synchronization for subscription "sub3", sequence "seq_0571" has finished 2024-08-13 16:41:47.436 AEST [11735] LOG: logical replication synchronization for subscription "sub3", sequence "seq_0582" has finished ... Is there a way to refresh sequences in a more natural (e.g. alphabetical) order to make these logs more readable? == Kind Regards, Peter Smith. Fujitsu Australia
PG docs - Sequence CYCLE clause
Hi hackers. While reviewing another thread I had cause to be looking more carefully at the SEQUENCE documentation. I found it curious that, unlike other clauses, the syntax of the CYCLE clause was not displayed on a line by itself. I have changed that, and at the same time I have moved the CYCLE syntax (plus its description) to be adjacent to MINVALUE/MAXVALUE, which IMO is where it naturally belongs. Please see the attached patch. Thoughts? == Kind Regards, Peter Smith. Fujitsu Australia docs_sequence_cycle.diff Description: Binary data
Re: Logical Replication of sequences
On Mon, Aug 12, 2024 at 11:07 PM vignesh C wrote: > > On Mon, 12 Aug 2024 at 10:40, Peter Smith wrote: > > > > Hi Vignesh, > > > > I found that when 2 subscriptions are both subscribing to a > > publication publishing sequences, an ERROR occurs on refresh. > > > > == > > > > Publisher: > > -- > > > > test_pub=# create publication pub1 for all sequences; > > > > Subscriber: > > --- > > > > test_sub=# create subscription sub1 connection 'dbname=test_pub' > > publication pub1; > > > > test_sub=# create subscription sub2 connection 'dbname=test_pub' > > publication pub1; > > > > test_sub=# alter subscription sub1 refresh publication sequences; > > 2024-08-12 15:04:04.947 AEST [7306] LOG: sequence "public.seq1" of > > subscription "sub1" set to INIT state > > 2024-08-12 15:04:04.947 AEST [7306] STATEMENT: alter subscription > > sub1 refresh publication sequences; > > 2024-08-12 15:04:04.947 AEST [7306] LOG: sequence "public.seq1" of > > subscription "sub1" set to INIT state > > 2024-08-12 15:04:04.947 AEST [7306] STATEMENT: alter subscription > > sub1 refresh publication sequences; > > 2024-08-12 15:04:04.947 AEST [7306] ERROR: tuple already updated by self > > 2024-08-12 15:04:04.947 AEST [7306] STATEMENT: alter subscription > > sub1 refresh publication sequences; > > ERROR: tuple already updated by self > > > > test_sub=# alter subscription sub2 refresh publication sequences; > > 2024-08-12 15:04:30.427 AEST [7306] LOG: sequence "public.seq1" of > > subscription "sub2" set to INIT state > > 2024-08-12 15:04:30.427 AEST [7306] STATEMENT: alter subscription > > sub2 refresh publication sequences; > > 2024-08-12 15:04:30.427 AEST [7306] LOG: sequence "public.seq1" of > > subscription "sub2" set to INIT state > > 2024-08-12 15:04:30.427 AEST [7306] STATEMENT: alter subscription > > sub2 refresh publication sequences; > > 2024-08-12 15:04:30.427 AEST [7306] ERROR: tuple already updated by self > > 2024-08-12 15:04:30.427 AEST [7306] STATEMENT: alter subscription > > sub2 refresh publication sequences; > > ERROR: tuple already updated by self > > This issue is fixed in the v20240812 version attached at [1]. > [1] - > https://www.postgresql.org/message-id/CALDaNm3hS58W0RTbgsMTk-YvXwt956uabA%3DkYfLGUs3uRNC2Qg%40mail.gmail.com > Yes, I confirmed it is now fixed. Thanks! == Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
Hi Vignesh, Here are my review comments for latest v20240812* patchset: patch v20240812-0001. No comments. patch v20240812-0002. Fixed docs.LGTM patch v20240812-0003. This is new refactoring. See below. patch v20240812-0004. (was 0003). See below. patch v20240812-0005. (was 0004). No comments. // patch v20240812-0003. 3.1. GENERAL Hmm. I am guessing this was provided as a separate patch to aid review by showing that existing functions are moved? OTOH you can't really judge this patch properly without already knowing details of what will come next in the sequencesync. i.e. As a *standalone* patch without the sequencesync.c the refactoring doesn't make much sense. Maybe it is OK later to combine patches 0003 and 0004. Alternatively, keep this patch separated but give greater emphasis in the comment header to say this patch only exists separately in order to help the review. == Commit message 3.2. Reorganized tablesync code to generate a syncutils file which will help in sequence synchronization worker code. ~ "generate" ?? == src/backend/replication/logical/syncutils.c 3.3. "common code" ?? FYI - There are multiple code comments mentioning "common code..." which, in the absence of the sequencesync worker (which comes in the next patch), have nothing "common" about them at all. Fixing them and then fixing them again in the next patch might cause unnecessary code churn, but OTOH they aren't correct as-is either. I have left them alone for now. ~ 3.4. function names With the re-shuffling that this patch does, and changing several from static to not-static, should the function names remain as they are? They look random to me. - finish_sync_worker(void) - invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue) - FetchTableStates(bool *started_tx) - process_syncing_tables(XLogRecPtr current_lsn) I think using a consistent naming convention would be better. e.g. SyncFinishWorker SyncInvalidateTableStates SyncFetchTableStates SyncProcessTables ~~~ nit - file header comment == src/backend/replication/logical/tablesync.c 3.5. -static void +void process_syncing_tables_for_sync(XLogRecPtr current_lsn) -static void +void process_syncing_tables_for_apply(XLogRecPtr current_lsn) Since these functions are no longer static should those function names be changed to use the CamelCase convention for non-static API? // patch v20240812-0004. == src/backend/replication/logical/syncutils.c nit - file header comment (made same as patch 0003) ~ FetchRelationStates: nit - IIUC sequence states are only INIT -> READY. So the comments in this function dont need to specifically talk about sequence INIT state. == src/backend/utils/misc/guc_tables.c 4.1. {"max_sync_workers_per_subscription", PGC_SIGHUP, REPLICATION_SUBSCRIBERS, - gettext_noop("Maximum number of table synchronization workers per subscription."), + gettext_noop("Maximum number of relation synchronization workers per subscription."), NULL, }, I was wondering if "relation synchronization workers" is meaningful to the user because that seems like new terminology. Maybe it should say "... of table + sequence synchronization workers..." == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/logical/syncutils.c b/src/backend/replication/logical/syncutils.c index 4e39836..4bbc481 100644 --- a/src/backend/replication/logical/syncutils.c +++ b/src/backend/replication/logical/syncutils.c @@ -1,5 +1,5 @@ /*- - * sequencesync.c + * syncutils.c * PostgreSQL logical replication: common synchronization code * * Copyright (c) 2024, PostgreSQL Global Development Group @@ -8,8 +8,8 @@ * src/backend/replication/logical/syncutils.c * * NOTES - * This file contains common code for synchronization of tables that will be - * help apply worker and table synchronization worker. + * This file contains code common to table synchronization workers, and + * the sequence synchronization worker. *- */ diff --git a/src/backend/replication/logical/syncutils.c b/src/backend/replication/logical/syncutils.c index ed353f1..1702be9 100644 --- a/src/backend/replication/logical/syncutils.c +++ b/src/backend/replication/logical/syncutils.c @@ -8,9 +8,8 @@ * src/backend/replication/logical/syncutils.c * * NOTES - * This file contains common code for synchronization of tables and - * sequences that will be help apply worker, table synchronization worker - *and sequence synchronization. + * This file contains code common to table synchronization workers, and + *the sequence synchronization worker. *
Re: Logical Replication of sequences
Hi Vignesh, I found that when 2 subscriptions are both subscribing to a publication publishing sequences, an ERROR occurs on refresh. == Publisher: -- test_pub=# create publication pub1 for all sequences; Subscriber: --- test_sub=# create subscription sub1 connection 'dbname=test_pub' publication pub1; test_sub=# create subscription sub2 connection 'dbname=test_pub' publication pub1; test_sub=# alter subscription sub1 refresh publication sequences; 2024-08-12 15:04:04.947 AEST [7306] LOG: sequence "public.seq1" of subscription "sub1" set to INIT state 2024-08-12 15:04:04.947 AEST [7306] STATEMENT: alter subscription sub1 refresh publication sequences; 2024-08-12 15:04:04.947 AEST [7306] LOG: sequence "public.seq1" of subscription "sub1" set to INIT state 2024-08-12 15:04:04.947 AEST [7306] STATEMENT: alter subscription sub1 refresh publication sequences; 2024-08-12 15:04:04.947 AEST [7306] ERROR: tuple already updated by self 2024-08-12 15:04:04.947 AEST [7306] STATEMENT: alter subscription sub1 refresh publication sequences; ERROR: tuple already updated by self test_sub=# alter subscription sub2 refresh publication sequences; 2024-08-12 15:04:30.427 AEST [7306] LOG: sequence "public.seq1" of subscription "sub2" set to INIT state 2024-08-12 15:04:30.427 AEST [7306] STATEMENT: alter subscription sub2 refresh publication sequences; 2024-08-12 15:04:30.427 AEST [7306] LOG: sequence "public.seq1" of subscription "sub2" set to INIT state 2024-08-12 15:04:30.427 AEST [7306] STATEMENT: alter subscription sub2 refresh publication sequences; 2024-08-12 15:04:30.427 AEST [7306] ERROR: tuple already updated by self 2024-08-12 15:04:30.427 AEST [7306] STATEMENT: alter subscription sub2 refresh publication sequences; ERROR: tuple already updated by self == Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
Hi Vignesh, I noticed it is not currently possible (there is no syntax way to do it) to ALTER an existing publication so that it will publish SEQUENCES. Isn't that a limitation? Why? For example,. Why should users be prevented from changing a FOR ALL TABLES publication into a FOR ALL TABLES, SEQUENCES one? Similarly, there are other combinations not possible DROP ALL SEQUENCES from a publication that is FOR ALL TABLES, SEQUENCES DROP ALL TABLES from a publication that is FOR ALL TABLES, SEQUENCES ADD ALL TABLES to a publication that is FOR ALL SEQUENCES ... == Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
Hi Vignesh, v20240809-0001. No comments. v20240809-0002. See below. v20240809-0003. See below. v20240809-0004. No comments. // Here are my review comments for patch v20240809-0002. nit - Tweak wording in new docs example, because a publication only publishes the sequences; it doesn't "synchronize" anything. // Here are my review comments for patch v20240809-0003. fetch_sequence_list: nit - move comment nit - minor rewording for parameter WARNING message == .../replication/logical/sequencesync.c src/backend/replication/logical/tablesync.c 1. Currently the declaration 'sequence_states_not_ready' list seems backwards. IMO it makes more sense for the declaration to be in sequencesync.c, and the extern in the tablesync.c. (please also see review comment #3 below which might affect this too). ~~~ 2. static bool -FetchTableStates(bool *started_tx) +FetchTableStates(void) { - static bool has_subrels = false; - - *started_tx = false; + static bool has_subtables = false; + bool started_tx = false; Maybe give the explanation why 'has_subtables' is declared static here. ~~~ 3. I am not sure that it was an improvement to move the process_syncing_sequences_for_apply() function into the sequencesync.c. Calling the sequence code from the tablesync code still looks strange. OTOH, I see why you don't want to leave it in tablesync.c. Perhaps it would be better to refactor/move all following functions back to the (apply) worker.c instead: - process_syncing_relations - process_syncing_sequences_for_apply(void) - process_syncing_tables_for_apply(void) Actually, now that there are 2 kinds of 'sync' workers, maybe you should introduce a new module (e.g. 'commonsync.c' or 'syncworker.c...), where you can put functions such as process_syncing_relations() plus any other code common to both tablesync and sequencesync. That might make more sense then having one call to the other. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml index 92758c7..64214ba 100644 --- a/doc/src/sgml/ref/create_publication.sgml +++ b/doc/src/sgml/ref/create_publication.sgml @@ -427,8 +427,8 @@ CREATE PUBLICATION all_sequences FOR ALL SEQUENCES; - Create a publication that publishes all changes in all tables and - synchronizes all sequences: + Create a publication that publishes all changes in all tables, and + all sequences for synchronization: CREATE PUBLICATION all_tables_sequences FOR ALL TABLES, SEQUENCES; diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 2ac63c7..c595873 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -2546,6 +2546,7 @@ fetch_sequence_list(WalReceiverConn *wrconn, List *publications) seqform = (Form_pg_sequence) GETSTRUCT(tup); + /* Build a list of sequences that don't match in publisher and subscriber */ if (seqform->seqtypid != seqtypid || seqform->seqmin != seqmin || seqform->seqmax != seqmax || seqform->seqstart != seqstart || seqform->seqincrement != seqincrement || @@ -2556,7 +2557,6 @@ fetch_sequence_list(WalReceiverConn *wrconn, List *publications) else appendStringInfoString(warning_sequences, ", "); - /* Add the sequences that don't match in publisher and subscriber */ appendStringInfo(warning_sequences, "\"%s.%s\"", get_namespace_name(get_rel_namespace(relid)), get_rel_name(relid)); @@ -2575,7 +2575,7 @@ fetch_sequence_list(WalReceiverConn *wrconn, List *publications) */ ereport(WARNING, errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("Sequence parameter in remote and local is not same for %s", + errmsg("Parameters differ for remote and local sequences %s", warning_sequences->data), errhint("Alter/Re-create the sequence using the same parameter as in remote.")); pfree(warning_sequences); diff --git a/src/test/subscription/t/034_sequences.pl b/src/test/subscription/t/034_sequences.pl index 7cc8c8c..52453cc 100644 --- a/src/test/subscription/t/034_sequences.pl +++ b/src/test/subscription/t/034_sequences.pl @@ -177,7 +177,7 @@ $node_subscriber->safe_psql( ALTER SUBSCRIPTION regress_seq_sub REFRESH PUBLICATION SEQUENCES"); like( $stder
Re: Logical Replication of sequences
Hi Vignesh, here are my review comments for the sequences docs patch v20240808-0004. == doc/src/sgml/logical-replication.sgml The new section content looked good. Just some nitpicks including: - renamed the section "Replicating Sequences" - added missing mention about how to publish sequences - rearranged the subscription commands into a more readable list - some sect2 titles were very long; I shortened them. - added markup for the sequence definition advice - other minor rewording and typo fixes ~ 1. IMO the "Caveats" section can be removed. - the advice to avoid changing the sequence definition is already given earlier in the "Sequence Definition Mismatches" section - the limitation of "incremental synchronization" is already stated in the logical replication "Limitations" section - (FYI, I removed it already in my nitpicks attachment) == doc/src/sgml/ref/alter_subscription.sgml nitpick - I reversed the paragraphs to keep the references in a natural order. == Kind Regards, Peter Smith. Fujitsu Australia On Fri, Aug 9, 2024 at 1:52 AM vignesh C wrote: > > On Thu, 8 Aug 2024 at 08:30, Peter Smith wrote: > > > > Hi Vignesh, Here are my v20240807-0003 review comments. > > > > 2a. > > The paragraph starts by saying "Sequence data is not replicated.". It > > seems wrong now. Doesn't that need rewording or removing? > > Changed it to incremental sequence changes. > > > ~ > > > > 2b. > > Should the info "If, however, some kind of switchover or failover..." > > be mentioned in the "Logical Replication Failover" section [2], > > instead of here? > > I think mentioning this here is appropriate. The other section focuses > more on how logical replication can proceed with a new primary. Once > the logical replication setup is complete, sequences can be refreshed > at any time. > > Rest of the comments are fixed, the attached v20240808 version patch > has the changes for the same. > > Regards, > Vignesh diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index d0fdcde..bc2aacc 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -1571,49 +1571,88 @@ test_sub=# SELECT * FROM t1 ORDER BY id; - Sequences + Replicating Sequences - Sequences can be synchronized between a publisher and a subscriber using - CREATE SUBSCRIPTION - to initially synchronize sequences, - ALTER SUBSCRIPTION ... REFRESH PUBLICATION to - synchronize any newly added sequences and - ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES - to re-synchronize all sequences. A new sequence synchronization worker will - be started to synchronize the sequences after executing the above commands - and will exit once the sequences are synchronized. + To replicate sequences from a publisher to a subscriber, first publish the + sequence using + CREATE PUBLICATION ... FOR ALL SEQUENCES. - Sequence synchronization worker will be used from - + At the subscriber side: + + + + use CREATE SUBSCRIPTION + to initially synchronize the published sequences. + + + + + use + ALTER SUBSCRIPTION ... REFRESH PUBLICATION + to synchronize any newly added sequences. + + + + + use + ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES + to re-synchronize all sequences. + + + + + + + A new sequence synchronization worker will be started to synchronize the + sequences after executing any of the above subscriber commands, and will + will exit once the sequences are synchronized. + + + The ability to launch a sequence synchronization worker will be limited by + the max_sync_workers_per_subscription configuration. - - Differences in Sequence Definitions Between Publisher and Subscriber + + Sequence Definition Mismatches + + + If there are differences in sequence definitions between the publisher and + subscriber, a WARNING is logged. + + -If there are differences in sequence definitions between the publisher and -subscriber, a WARNING is logged. To resolve this, use +To resolve this, use ALTER SEQUENCE to align the subscriber's sequence parameters with those of the publisher. Subsequently, execute ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES. -It is advisable not to change sequence definitions on either the publisher -or the subscriber until synchronization is complete and the + + +Changes to sequence definitions during the execution of + +ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES +may not be detected, potentially leading to inconsistent values
Re: Logical Replication of sequences
Hi Vignesh, I reviewed the latest v20240808-0003 patch. Attached are my minor change suggestions. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c index 4e2f960..a77e810 100644 --- a/src/backend/catalog/pg_subscription.c +++ b/src/backend/catalog/pg_subscription.c @@ -517,9 +517,9 @@ HasSubscriptionTables(Oid subid) * * all_states: * If getting tables, if all_states is true get all tables, otherwise - * only get tables that have not reached 'READY' state. + * only get tables that have not reached READY state. * If getting sequences, if all_states is true get all sequences, - * otherwise only get sequences that are in 'init' state. + * otherwise only get sequences that are in INIT state. * * The returned list is palloc'ed in the current memory context. */ diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index bb6aa8e..2833379 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -868,10 +868,12 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, * Update the subscription to refresh both the publication and the publication * objects associated with the subscription. * - * If 'copy_data' parameter is true, the function will set the state - * to "init"; otherwise, it will set the state to "ready". + * Parameters: * - * When 'validate_publications' is provided with a publication list, the + * If 'copy_data' is true, the function will set the state to INIT; otherwise, + * it will set the state to READY. + * + * If 'validate_publications' is provided with a publication list, the * function checks that the specified publications exist on the publisher. * * If 'refresh_tables' is true, update the subscription by adding or removing @@ -882,9 +884,22 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, * sequences that have been added or removed since the last subscription * creation or publication refresh. * - * If 'resync_all_sequences' is true, mark all objects with "init" state - * for re-synchronization; otherwise, only update the newly added tables and - * sequences based on the copy_data parameter. + * Note, this is a common function for handling different REFRESH commands + * according to the parameter 'resync_all_sequences' + * + * 1. ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES + *(when parameter resync_all_sequences is true) + * + *The function will mark all sequences with INIT state. + *Assert copy_data is true. + *Assert refresh_tables is false. + *Assert refresh_sequences is true. + * + * 2. ALTER SUBSCRIPTION ... REFRESH PUBLICATION [WITH (copy_data=true|false)] + *(when parameter resync_all_sequences is false) + * + *The function will update only the newly added tables and/or sequences + *based on the copy_data parameter. */ static void AlterSubscription_refresh(Subscription *sub, bool copy_data, @@ -910,11 +925,10 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data, WalReceiverConn *wrconn; boolmust_use_password; - /* resync_all_sequences cannot be specified with refresh_tables */ - Assert(!(resync_all_sequences && refresh_tables)); - - /* resync_all_sequences cannot be specified with copy_data as false */ - Assert(!(resync_all_sequences && !copy_data)); +#ifdef USE_ASSERT_CHECKING + if (resync_all_sequences) + Assert(copy_data && !refresh_tables && refresh_sequences); +#endif /* Load the library providing us libpq calls. */ load_file("libpqwalreceiver", false);
Re: Pgoutput not capturing the generated columns
Hi Shubham, I think the v25-0001 patch only half-fixes the problems reported in my v24-0001 review. ~ Background (from the commit message): This commit enables support for the 'include_generated_columns' option in logical replication, allowing the transmission of generated column information and data alongside regular table changes. ~ The broken TAP test scenario in question is replicating from a "not-generated" column to a "generated" column. As the generated column is not on the publishing side, IMO the 'include_generated_columns' option should have zero effect here. In other words, I expect this TAP test for 'include_generated_columns = true' case should also be failing, as I wrote already yesterday: +# FIXME +# Since there is no generated column on the publishing side this should give +# the same result as the previous test. -- e.g. something like: +# ERROR: logical replication target relation "public.tab_nogen_to_gen" is missing +# replicated column: "b" == Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
On Thu, Aug 8, 2024 at 1:55 PM Amit Kapila wrote: > > On Wed, Aug 7, 2024 at 10:12 AM Peter Smith wrote: > > > > This is mostly a repeat of my previous mail from a while ago [1] but > > includes some corrections, answers, and more examples. I'm going to > > try to persuade one last time because the current patch is becoming > > stable, so I wanted to revisit this syntax proposal before it gets too > > late to change anything. > > > > If there is some problem with the proposed idea please let me know > > because I can see only the advantages and no disadvantages of doing it > > this way. > > > > ~~~ > > > > The current patchset offers two forms of subscription refresh: > > 1. ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( refresh_option > > [= value] [, ... ] ) ] > > 2. ALTER SUBSCRIPTION name REFRESH PUBLICATION SEQUENCES > > > > Since 'copy_data' is the only supported refresh_option, really it is more > > like: > > 1. ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( copy_data [= > > true|false] ) ] > > 2. ALTER SUBSCRIPTION name REFRESH PUBLICATION SEQUENCES > > > > ~~~ > > > > I proposed previously that instead of having 2 commands for refreshing > > subscriptions we should have a single refresh command: > > > > ALTER SUBSCRIPTION name REFRESH PUBLICATION [TABLES|SEQUENCES] [ WITH > > ( copy_data [= true|false] ) ] > > > > Why? > > > > - IMO it is less confusing than having 2 commands that both refresh > > sequences in slightly different ways > > > > - It is more flexible because apart from refreshing everything, a user > > can choose to refresh only tables or only sequences if desired; IMO > > more flexibility is always good. > > > > - There is no loss of functionality from the current implementation > > AFAICT. You can still say "ALTER SUBSCRIPTION sub REFRESH PUBLICATION > > SEQUENCES" exactly the same as the patchset allows. > > > > ~~~ > > > > So, to continue this proposal, let the meaning of 'copy_data' for > > SEQUENCES be as follows: > > > > - when copy_data == false: it means don't copy data (i.e. don't > > synchronize anything). Add/remove sequences from pg_subscriber_rel as > > needed. > > > > - when copy_data == true: it means to copy data (i.e. synchronize) for > > all sequences. Add/remove sequences from pg_subscriber_rel as needed) > > > > I find overloading the copy_data option more confusing than adding a > new variant for REFRESH. To make it clear, we can even think of > extending the command as ALTER SUBSCRIPTION name REFRESH PUBLICATION > ALL SEQUENCES or something like that. I don't know where there is a > need or not but one can imagine extending it as ALTER SUBSCRIPTION > name REFRESH PUBLICATION SEQUENCES [, , ..]. > This will allow to selectively refresh the sequences. > But, I haven't invented a new overloading for "copy_data" option (meaning "synchronize") for sequences. The current patchset already interprets copy_data exactly this way. For example, below are patch 0003 results: ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION WITH (copy_data=false) - this will add/remove new sequences in pg_subscription_rel, but it will *not* synchronize the new sequence ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION WITH (copy_data=true) - this will add/remove new sequences in pg_subscription_rel, and it *will* synchronize the new sequence ~ I only proposed that copy_data should apply to *all* sequences, not just new ones. == Kind Regards. Peter Smith. Fujitsu Australia.
Re: Logical Replication of sequences
Hi Vignesh, Here are my v20240807-0003 review comments. == 1. GENERAL DOCS. IMO the replication of SEQUENCES is a big enough topic that it deserves to have its own section in the docs chapter 31 [1]. Some of the create/alter subscription docs content would stay where it is in, but a new chapter would just tie everything together better. It could also serve as a better place to describe the other sequence replication content like: (a) getting a WARNING for mismatched sequences and how to handle it. (b) how can the user know when a subscription refresh is required to (re-)synchronise sequences (c) pub/sub examples == doc/src/sgml/logical-replication.sgml 2. Restrictions Sequence data is not replicated. The data in serial or identity columns backed by sequences will of course be replicated as part of the table, but the sequence itself would still show the start value on the subscriber. If the subscriber is used as a read-only database, then this should typically not be a problem. If, however, some kind of switchover or failover to the subscriber database is intended, then the sequences would need to be updated to the latest values, either by executing ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES or by copying the current data from the publisher (perhaps using pg_dump) or by determining a sufficiently high value from the tables themselves. ~ 2a. The paragraph starts by saying "Sequence data is not replicated.". It seems wrong now. Doesn't that need rewording or removing? ~ 2b. Should the info "If, however, some kind of switchover or failover..." be mentioned in the "Logical Replication Failover" section [2], instead of here? == doc/src/sgml/ref/alter_subscription.sgml 3. Sequence values may occasionally become out of sync due to updates in the publisher. To verify this, compare the pg_subscription_rel.srsublsn on the subscriber with the page_lsn obtained from the pg_sequence_state for the sequence on the publisher. If the sequence is still using prefetched values, the page_lsn will not be updated. In such cases, you will need to directly compare the sequences and execute REFRESH PUBLICATION SEQUENCES if required. ~ 3a. This whole paragraph may be better put in the new chapter that was suggested earlier in review comment #1. ~ 3b. Is it only "Occasionally"? I expected subscriber-side sequences could become stale quite often. ~ 3c. Is this advice very useful? It's saying if the LSN is different then the sequence is out of date, but if the LSN is not different then you cannot tell. Why not ignore LSN altogether and just advise the user to directly compare the sequences in the first place? == Also, there are more minor suggestions in the attached nitpicks diff. == [1] https://www.postgresql.org/docs/current/logical-replication.html [2] file:///usr/local/pg_oss/share/doc/postgresql/html/logical-replication-failover.html Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 6b320b1..bb6aa8e 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -726,7 +726,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, recordDependencyOnOwner(SubscriptionRelationId, subid, owner); /* -* XXX todo: If the subscription is for a sequence-only publication, +* XXX: If the subscription is for a sequence-only publication, * creating this origin is unnecessary at this point. It can be created * later during the ALTER SUBSCRIPTION ... REFRESH command, if the * publication is updated to include tables or tables in schemas. @@ -756,7 +756,7 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, PG_TRY(); { - bool hastables = false; + boolhas_tables; List *relations; chartable_state; @@ -771,16 +771,14 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, table_state = opts.copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY; /* -* Get the table list from publisher and build local table status -* info. +* Build local relation status info. Relations are for both tables and +* sequences from the publisher. */ relations = fetch_table_list(wrconn, publications); - if (relations != NIL) - hastables = true; - - /* Include the sequence list from publisher. */ + has_tables = relations != NIL;
Re: Pgoutput not capturing the generated columns
Hi Shubham, Here are my review comments for patch v24-0001 I think the TAP tests have incorrect expected results for the nogen-to-gen case. Whereas the HEAD code will cause "ERROR" for this test scenario, patch 0001 does not. IMO the behaviour should be unchanged for this scenario which has no generated column on the publisher side. So it seems this is a bug in patch 0001. FYI, I have included "FIXME" comments in the attached top-up diff patch to show which test cases I think are expecting wrong results. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/test/subscription/t/011_generated.pl b/src/test/subscription/t/011_generated.pl index 13499a1..a9430f7 100644 --- a/src/test/subscription/t/011_generated.pl +++ b/src/test/subscription/t/011_generated.pl @@ -478,10 +478,11 @@ $node_publisher->safe_psql('postgres', # regress_sub1_combo_nogen_to_gen: (include_generated_columns = false) # -# XXX -# The test below shows that current PG17 behavior does not give an error, -# But this conflicts with the copy_data=true behavior so it might be a PG17 bug. -# Needs more study. +# FIXME +# I think the following expected result is wrong. IIUC it should give +# the same error as HEAD -- e.g. something like: +# ERROR: logical replication target relation "public.tab_nogen_to_gen" is missing +# replicated column: "b" $node_publisher->wait_for_catchup('regress_sub1_combo_nogen_to_gen_nocopy'); $result = $node_subscriber->safe_psql('postgres', @@ -495,9 +496,11 @@ is( $result, qq(4|88 # When copy_data=false, no COPY error occurs. # The col 'b' is not replicated; the subscriber-side generated value is inserted. # -# XXX -# It is correct for this to give the same result as above, but it needs more -# study to determine if the above result was actually correct, or a PG17 bug. +# FIXME +# Since there is no generated column on the publishing side this should give +# the same result as the previous test. -- e.g. something like: +# ERROR: logical replication target relation "public.tab_nogen_to_gen" is missing +# replicated column: "b" $node_publisher->wait_for_catchup('regress_sub2_combo_nogen_to_gen'); $result = $node_subscriber2->safe_psql('postgres',
Re: Logical Replication of sequences
t copy_data) ALTER SUBSCRIPTION sub REFRESH PUBLICATION - same as ex8 == [1] https://www.postgresql.org/message-id/CAHut%2BPuFH1OCj-P1UKoRQE2X4-0zMG%2BN1V7jdn%3DtOQV4RNbAbw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
ENCES_SYNC_PER_BATCH several times so I changed the wording of one of them ~~~ fetch_remote_sequence_data: nitpick - all other params have the same name as sequence members, so change the parameter name /lsn/page_lsn/ ~ copy_sequence: nitpick - rename var /seq_lsn/seq_page_lsn/ == src/backend/replication/logical/tablesync.c 6. process_syncing_sequences_for_apply + * If a sequencesync worker is running already, there is no need to start a new + * one; the existing sequencesync worker will synchronize all the sequences. If + * there are still any sequences to be synced after the sequencesync worker + * exited, then a new sequencesync worker can be started in the next iteration. + * To prevent starting the sequencesync worker at a high frequency after a + * failure, we store its last failure time. We start the sync worker for the + * same relation after waiting at least wal_retrieve_retry_interval. Why is it talking about "We start the sync worker for the same relation ...". The sequencesync_failuretime is per sync worker, not per relation. And, I don't see any 'same relation' check in the code. == src/include/catalog/pg_subscription_rel.h GetSubscriptionRelations: nitpick - changed parameter name /all_relations/all_states/ == src/test/subscription/t/034_sequences.pl nitpick - add some ## comments to highlight the main test parts to make it easier to read. nitpick - fix typo /syned/synced/ 7. More test cases? IIUC you can also get a sequence mismatch warning during "ALTER ... REFRESH PUBLICATION", and "CREATE SUBSCRIPTION". So, should those be tested also? == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 16c427e..5c66797 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8109,7 +8109,7 @@ SCRAM-SHA-256$<iteration count>:&l This catalog only contains tables and sequences known to the subscription - after running either + after running CREATE SUBSCRIPTION or ALTER SUBSCRIPTION ... REFRESH PUBLICATION or diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index eb7d544..f280019 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -200,6 +200,12 @@ ALTER SUBSCRIPTION name RENAME TO < ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES + See for recommendations on how + to handle any warnings about differences in the sequence definition + between the publisher and the subscriber, which might occur when + copy_data = true. + + See for details of how copy_data = true can interact with the origin @@ -211,11 +217,6 @@ ALTER SUBSCRIPTION name RENAME TO < parameter of CREATE SUBSCRIPTION for details about copying pre-existing data in binary format. - - See the copy_data - on how to handle the warnings regarding the difference in sequence - definition between the publisher and the subscriber. - @@ -230,12 +231,12 @@ ALTER SUBSCRIPTION name RENAME TO < sequence data with the publisher. Unlike ALTER SUBSCRIPTION ... REFRESH PUBLICATION which only synchronizes newly added sequences, REFRESH PUBLICATION SEQUENCES - will re-synchronize the sequence data for all subscribed sequences. The - sequence definition can differ between the publisher and the subscriber, - this is detected and a WARNING is logged to the user, but the warning is - only an indication of a potential problem; it is recommended to alter the - sequence to keep the sequence option same as the publisher and execute - the command again. + will re-synchronize the sequence data for all subscribed sequences. + + + See for recommendations on how + to handle any warnings about differences in the sequence definition + between the publisher and the subscriber. diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index de3bdb8..e28ed96 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -264,12 +264,10 @@ CREATE SUBSCRIPTION subscription_nameorigin parameter. - The sequence definition can differ between the publisher and the - subscriber, this is detected and a WARNING is logged to the user, but - the warning is only an indication of a potential problem; it is - recommended to alter the sequence to keep the sequence option same as - the publisher and execute - ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES. + See for recommendations
Re: Pgoutput not capturing the generated columns
(PID 11095) exited with exit code 1 2024-08-05 13:25:24.917 AEST [11225] LOG: logical replication apply worker for subscription "sub1_nocopy" has started 2024-08-05 13:25:24.926 AEST [11225] ERROR: logical replication target relation "public.tab_nogen_to_gen" is missing replicated column: "b" 2024-08-05 13:25:24.926 AEST [11225] CONTEXT: processing remote data for replication origin "pg_16390" during message type "INSERT" in transaction 742, finished at 0/1967BB0 2024-08-05 13:25:24.927 AEST [20039] LOG: background worker "logical replication apply worker" (PID 11225) exited with exit code 1 ... == [1] https://www.postgresql.org/message-id/CAHut%2BPvtT8fKOfvVYr4vANx_fr92vedas%2BZRbQxvMC097rks6w%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Hi Shubhab. Here are some more review comments for the v23-0001. == 011_generated.pl b/src/test/subscription/t/011_generated.pl nitpick - renamed /regress_pub/regress_pub_tab1/ and /regress_sub1/regress_sub1_tab1/ nitpick - typo /inital data /initial data/ nitpick - typo /snode_subscriber2/node_subscriber2/ nitpick - tweak the combo initial sync comments and messages nitpick - /#Cleanup/# cleanup/ nitpick - tweak all the combo normal replication comments nitpick - removed blank line at the end ~~~ 1. Refactor tab_gen_to_missing initial sync tests. I moved the tab_gen_to_missing initial sync for node_subscriber2 to be back where all the other initial sync tests are done. See the nitpicks patch file. ~~~ 2. Refactor tab_nogen_to_gen initial sync tests I moved all the tab_nogen_to_gen initial sync tests back to where the other initial sync tests are done. See the nitpicks patch file. ~~~ 3. Added another test case: Because the (current PG17) nogen-to-gen initial sync test case (with copy_data=true) gives an ERROR, I have added another combination to cover normal replication (e.g. using copy_data=false). See the nitpicks patch file. (This has exposed an inconsistency which IMO might be a PG17 bug. I have included TAP test comments about this, and plan to post a separate thread for it later). ~ 4. GUC Moving and adding more CREATE SUBSCRIPTION exceeded some default GUCs, so extra configuration was needed. See the nitpick patch file. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/test/subscription/t/011_generated.pl b/src/test/subscription/t/011_generated.pl index 0b596b7..2be06c6 100644 --- a/src/test/subscription/t/011_generated.pl +++ b/src/test/subscription/t/011_generated.pl @@ -12,16 +12,25 @@ use Test::More; my $node_publisher = PostgreSQL::Test::Cluster->new('publisher'); $node_publisher->init(allows_streaming => 'logical'); +$node_publisher->append_conf('postgresql.conf', + "max_wal_senders = 20 +max_replication_slots = 20"); $node_publisher->start; # All subscribers on this node will use parameter include_generated_columns = false my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber'); $node_subscriber->init; +$node_subscriber->append_conf('postgresql.conf', + "max_logical_replication_workers = 20 +max_worker_processes = 20"); $node_subscriber->start; # All subscribers on this node will use parameter include_generated_columns = true my $node_subscriber2 = PostgreSQL::Test::Cluster->new('subscriber2'); $node_subscriber2->init; +$node_subscriber2->append_conf('postgresql.conf', + "max_logical_replication_workers = 20 +max_worker_processes = 20"); $node_subscriber2->start; my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres'; @@ -139,7 +148,7 @@ $node_publisher->safe_psql('postgres', # pub_combo_gen_to_missing is not included in pub_combo, because some tests give errors. $node_publisher->safe_psql('postgres', - "CREATE PUBLICATION regress_pub FOR TABLE tab1"); + "CREATE PUBLICATION regress_pub_tab1 FOR TABLE tab1"); $node_publisher->safe_psql('postgres', "CREATE PUBLICATION regress_pub_combo FOR TABLE tab_gen_to_gen, tab_gen_to_nogen, tab_missing_to_gen" ); @@ -157,10 +166,10 @@ $node_publisher->safe_psql('postgres', # # Note that all subscriptions created on node_subscriber2 use copy_data = false, # because copy_data = true with include_generated_columns is not yet supported. -# For this reason, the expected inital data on snode_subscriber2 is always empty. +# For this reason, the expected inital data on node_subscriber2 is always empty. $node_subscriber->safe_psql('postgres', - "CREATE SUBSCRIPTION regress_sub1 CONNECTION '$publisher_connstr' PUBLICATION regress_pub" + "CREATE SUBSCRIPTION regress_sub1_tab1 CONNECTION '$publisher_connstr' PUBLICATION regress_pub_tab1" ); $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION regress_sub1_combo CONNECTION '$publisher_connstr' PUBLICATION regress_pub_combo" @@ -168,11 +177,18 @@ $node_subscriber->safe_psql('postgres', $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION regress_sub1_combo_gen_to_missing CONNECTION '$publisher_connstr' PUBLICATION regress_pub_combo_gen_to_missing" ); +# Note, regress_sub1_combo_nogen_to_gen is not created here due to expected errors. See later. $node_subscriber2->safe_psql('postgres', "CREATE SUBSCRIPTION regress_sub2_combo CONNECTION '$publisher_connstr' PUBLICATION regress
Re: Pgoutput not capturing the generated columns
Hi, Here are my review comments for patch v22-0001 All comments now are only for the TAP test. == src/test/subscription/t/011_generated.pl 1. I added all new code for the missing combination test case "gen-to-missing". See nitpicks diff. - create a separate publication for this "tab_gen_to_missing" table because the test gives subscription errors. - for the initial data - for the replicated data ~~~ 2. I added sub1 and sub2 subscriptions for every combo test (previously some were absent). See nitpicks diff. ~~~ 3. There was a missing test case for nogen-to-gen combination, and after experimenting with this I am getting a bit suspicious, Currently, it seems that if a COPY is attempted then the error would be like this: 2024-08-01 17:16:45.110 AEST [18942] ERROR: column "b" is a generated column 2024-08-01 17:16:45.110 AEST [18942] DETAIL: Generated columns cannot be used in COPY. OTOH, if a COPY is not attempted (e.g. copy_data = false) then patch 0001 allows replication to happen. And the generated value of the subscriber "b" takes precedence. I have included these tests in the nitpicks diff of patch 0001. Those results weren't exactly what I was expecting. That is why it is so important to include *every* test combination in these TAP tests -- because unless we know how it works today, we won't know if we are accidentally breaking the current behaviour with the other (0002, 0003) patches. Please experiment in patches 0001 and 0002 using tab_nogen_to_gen more to make sure the (new?) patch errors make sense and don't overstep by giving ERRORs when they should not. Also, many other smaller issues/changes were done: ~~~ Creating tables: nitpick - rearranged to keep all combo test SQLs in a consistent order throughout this file 1/ gen-to-gen 2/ gen-to-nogen 3/ gen-to-missing 4/ missing-to-gen 5/ nogen-to-gen nitpick - fixed the wrong comment for CREATE TABLE tab_nogen_to_gen. nitpick - tweaked some CREATE TABLE comments for consistency. nitpick - in the v22 patch many of the generated col 'b' use different computations for every test. It makes it unnecessarily difficult to read/review the expected results. So, I've made them all the same. Now computation is "a * 2" on the publisher side, and "a * 22" on the subscriber side. ~~~ Creating Publications and Subscriptions: nitpick - added comment for all the CREATE PUBLICATION nitpick - added comment for all the CREATE SUBSCRIPTION nitpick - I moved the note about copy_data = false to where all the node_subscriber2 subscriptions are created. Also, don't explicitly refer to "patch 000" in the comment, because that will not make any sense after getting pushed. nitpick - I changed many subscriber names to consistently use "sub1" or "sub2" within the name (this is the visual cue of which node_subscriber they are on). e.g. /regress_sub_combo2/regress_sub2_combo/ ~~~ Initial Sync tests: nitpick - not sure if it is possible to do the initial data tests for "nogen_to_gen" in the normal place. For now, it is just replaced by a comment. NOTE - Maybe this should be refactored later to put all the initial data checks in one place. I'll think about this point more in the next review. ~~~ nitpick - Changed cleanup I drop subscriptions before publications. nitpick - remove the unnecessary blank line at the end. == Please see the attached diffs patch (apply it atop patch 0001) which includes all the nipick changes mentioned above. ~~ BTW, For a quicker turnaround and less churning please consider just posting the v23-0001 by itself instead of waiting to rebase all the subsequent patches. When 0001 settles down some more then rebase the others. ~~ Also, please run the indentation tool over this code ASAP. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/test/subscription/t/011_generated.pl b/src/test/subscription/t/011_generated.pl index 05b83f6..504714a 100644 --- a/src/test/subscription/t/011_generated.pl +++ b/src/test/subscription/t/011_generated.pl @@ -34,59 +34,60 @@ $node_subscriber->safe_psql('postgres', "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 22) STORED, c int)" ); +# tab_gen_to_gen: # publisher-side has generated col 'b'. # subscriber-side has generated col 'b', with different computation. $node_publisher->safe_psql('postgres', - "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a + 10) STORED)"); + "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a * 2) STORED)"); $node_subscriber->safe_psql('postgres', - "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a + 20) STORED)"); + "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS A
Re: Logical Replication of sequences
Hi Vignesh, I noticed that when replicating sequences (using the latest patches 0730_2*) the subscriber-side checks the *existence* of the sequence, but apparently it is not checking other sequence attributes. For example, consider: Publisher: "CREATE SEQUENCE s1 START 1 INCREMENT 2;" should be a sequence of only odd numbers. Subscriber: "CREATE SEQUENCE s1 START 2 INCREMENT 2;" should be a sequence of only even numbers. Because the names match, currently the patch allows replication of the s1 sequence. I think that might lead to unexpected results on the subscriber. IMO it might be safer to report ERROR unless the sequences match properly (i.e. not just a name check). Below is a demonstration the problem: == Publisher: == (publisher sequence is odd numbers) test_pub=# create sequence s1 start 1 increment 2; CREATE SEQUENCE test_pub=# select * from nextval('s1'); nextval - 1 (1 row) test_pub=# select * from nextval('s1'); nextval - 3 (1 row) test_pub=# select * from nextval('s1'); nextval - 5 (1 row) test_pub=# CREATE PUBLICATION pub1 FOR ALL SEQUENCES; CREATE PUBLICATION test_pub=# == Subscriber: == (subscriber sequence is even numbers) test_sub=# create sequence s1 start 2 increment 2; CREATE SEQUENCE test_sub=# SELECT * FROM nextval('s1'); nextval - 2 (1 row) test_sub=# SELECT * FROM nextval('s1'); nextval - 4 (1 row) test_sub=# SELECT * FROM nextval('s1'); nextval - 6 (1 row) test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub' PUBLICATION pub1; 2024-08-01 08:43:04.198 AEST [24325] WARNING: subscriptions created by regression test cases should have names starting with "regress_" WARNING: subscriptions created by regression test cases should have names starting with "regress_" NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2024-08-01 08:43:04.294 AEST [26240] LOG: logical replication apply worker for subscription "sub1" has started 2024-08-01 08:43:04.309 AEST [26244] LOG: logical replication sequence synchronization worker for subscription "sub1" has started 2024-08-01 08:43:04.323 AEST [26244] LOG: logical replication synchronization for subscription "sub1", sequence "s1" has finished 2024-08-01 08:43:04.323 AEST [26244] LOG: logical replication sequence synchronization worker for subscription "sub1" has finished (after the CREATE SUBSCRIPTION we are getting replicated odd values from the publisher, even though the subscriber side sequence was supposed to be even numbers) test_sub=# SELECT * FROM nextval('s1'); nextval - 7 (1 row) test_sub=# SELECT * FROM nextval('s1'); nextval - 9 (1 row) test_sub=# SELECT * FROM nextval('s1'); nextval - 11 (1 row) (Looking at the description you would expect odd values for this sequence to be impossible) test_sub=# \dS+ s1 Sequence "public.s1" Type | Start | Minimum | Maximum | Increment | Cycles? | Cache +---+-+-+---+-+--- bigint | 2 | 1 | 9223372036854775807 | 2 | no | 1 == Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
Hi Vignesh, I have a question about the subscriber-side behaviour of currval(). == AFAIK it is normal for currval() to give error is nextval() has not yet been called [1] For example. test_pub=# create sequence s1; CREATE SEQUENCE test_pub=# select * from currval('s1'); 2024-08-01 07:42:48.619 AEST [24131] ERROR: currval of sequence "s1" is not yet defined in this session 2024-08-01 07:42:48.619 AEST [24131] STATEMENT: select * from currval('s1'); ERROR: currval of sequence "s1" is not yet defined in this session test_pub=# select * from nextval('s1'); nextval - 1 (1 row) test_pub=# select * from currval('s1'); currval - 1 (1 row) test_pub=# ~~~ OTOH, I was hoping to be able to use currval() at the subscriber=side to see the current sequence value after issuing ALTER .. REFRESH PUBLICATION SEQUENCES. Unfortunately, it has the same behaviour where currval() cannot be used without nextval(). But, on the subscriber, you probably never want to do an explicit nextval() independently of the publisher. Is this currently a bug, or maybe a quirk that should be documented? For example: Publisher == test_pub=# create sequence s1; CREATE SEQUENCE test_pub=# CREATE PUBLICATION pub1 FOR ALL SEQUENCES; CREATE PUBLICATION test_pub=# select * from nextval('s1'); nextval - 1 (1 row) test_pub=# select * from nextval('s1'); nextval - 2 (1 row) test_pub=# select * from nextval('s1'); nextval - 3 (1 row) test_pub=# Subscriber == (Notice currval() always gives an error unless nextval() is used prior). test_sub=# create sequence s1; CREATE SEQUENCE test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub' PUBLICATION pub1; 2024-08-01 07:51:06.955 AEST [24325] WARNING: subscriptions created by regression test cases should have names starting with "regress_" WARNING: subscriptions created by regression test cases should have names starting with "regress_" NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2024-08-01 07:51:07.023 AEST [4211] LOG: logical replication apply worker for subscription "sub1" has started 2024-08-01 07:51:07.037 AEST [4213] LOG: logical replication sequence synchronization worker for subscription "sub1" has started 2024-08-01 07:51:07.063 AEST [4213] LOG: logical replication synchronization for subscription "sub1", sequence "s1" has finished 2024-08-01 07:51:07.063 AEST [4213] LOG: logical replication sequence synchronization worker for subscription "sub1" has finished test_sub=# SELECT * FROM currval('s1'); 2024-08-01 07:51:19.688 AEST [24325] ERROR: currval of sequence "s1" is not yet defined in this session 2024-08-01 07:51:19.688 AEST [24325] STATEMENT: SELECT * FROM currval('s1'); ERROR: currval of sequence "s1" is not yet defined in this session test_sub=# ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION SEQUENCES; ALTER SUBSCRIPTION test_sub=# 2024-08-01 07:51:35.298 AEST [4993] LOG: logical replication sequence synchronization worker for subscription "sub1" has started test_sub=# 2024-08-01 07:51:35.321 AEST [4993] LOG: logical replication synchronization for subscription "sub1", sequence "s1" has finished 2024-08-01 07:51:35.321 AEST [4993] LOG: logical replication sequence synchronization worker for subscription "sub1" has finished test_sub=# test_sub=# SELECT * FROM currval('s1'); 2024-08-01 07:51:41.438 AEST [24325] ERROR: currval of sequence "s1" is not yet defined in this session 2024-08-01 07:51:41.438 AEST [24325] STATEMENT: SELECT * FROM currval('s1'); ERROR: currval of sequence "s1" is not yet defined in this session test_sub=# test_sub=# SELECT * FROM nextval('s1'); nextval - 4 (1 row) test_sub=# SELECT * FROM currval('s1'); currval - 4 (1 row) test_sub=# == [1] https://www.postgresql.org/docs/current/functions-sequence.html Kind Regards, Peter Smith. Fujitsu Australia.
Re: Logical Replication of sequences
E_INIT, +InvalidXLogRecPtr); (This is a continuation of my doubts regarding 'all_relations' in the previous review comment #4 above) Here are some more questions about it: ~ 5a. Why is this an 'else' of the !bsearch? It needs more explanation what this case means. ~ 5b. Along with more description, it might be better to reverse the !bsearch condition, so this ('else') code is not so distantly separated from the condition. ~ 5c. Saying "only supported for sequences" seems strange: e.g. what would it even mean to "re-synchronize" tables? They would all have to be truncated first -- so if re-sync for tables has no meaning maybe the parameter is misnamed and should just be 'resync_all_sequences' or similar? In any case, an Assert here might be good. == src/backend/replication/logical/launcher.c logicalrep_worker_find: nitpick - I feel the function comment "We are only interested in..." is now redundant since you are passing the exact worker type you want. nitpick - I added an Assert for the types you are expecting to look for nitpick - The comment "Search for attached worker..." is stale now because there are more search criteria nitpick - IMO the "Skip parallel apply workers." code is no longer needed now that you are matching the worker type. ~~~ 6. logicalrep_worker_launch * - must be valid worker type * - tablesync workers are only ones to have relid * - parallel apply worker is the only kind of subworker + * - sequencesync workers will not have relid */ Assert(wtype != WORKERTYPE_UNKNOWN); Assert(is_tablesync_worker == OidIsValid(relid)); Assert(is_parallel_apply_worker == (subworker_dsm != DSM_HANDLE_INVALID)); + Assert(!is_sequencesync_worker || !OidIsValid(relid)); On further reflection, is that added comment and added Assert even needed? I think they can be removed because saying "tablesync workers are only ones to have relid" seems to already cover what we needed to say/assert. ~~~ logicalrep_worker_stop: nitpick - /type/wtype/ for readability ~~~ 7. /* * Count the number of registered (not necessarily running) sync workers * for a subscription. */ int logicalrep_sync_worker_count(Oid subid) ~ I thought this function should count the sequencesync worker as well. == .../replication/logical/sequencesync.c fetch_remote_sequence_data: nitpick - tweaked function comment nitpick - /value/last_value/ for readability ~ 8. + *lsn = DatumGetInt64(slot_getattr(slot, 4, &isnull)); + Assert(!isnull); Should that be DatumGetUInt64? ~~~ copy_sequence: nitpick - tweak function header. nitpick - renamed the sequence vars for consistency, and declared them all together. == src/backend/replication/logical/tablesync.c 9. void invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue) { - table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD; + relation_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD; } I assume you changed the 'table_states_validity' name because this is no longer exclusively for tables. So, should the function name also be similarly changed? ~~~ process_syncing_sequences_for_apply: nitpick - tweaked the function comment nitpick - cannot just say "if there is not one already." a sequence syn worker might not even be needed. nitpick - added blank line for readability ~ 10. + if (syncworker) + { + /* Now safe to release the LWLock */ + LWLockRelease(LogicalRepWorkerLock); + break; + } + else + { This 'else' can be removed if you wish to pull back all the indentation. ~~~ 11. process_syncing_tables(XLogRecPtr current_lsn) Is the function name still OK given that is is now also syncing for sequences? ~~~ FetchTableStates: nitpick - Reworded some of the function comment nitpick - Function comment is stale because it is still referring to the function parameter which this patch removed. nitpick - tweak a comment == src/include/commands/sequence.h 12. +#define SEQ_LOG_CNT_INVALID (-1) See a previous review comment (#2 above) where I wondered why not use value 0 for this. ~~~ 13. extern void SequenceChangePersistence(Oid relid, char newrelpersistence); extern void DeleteSequenceTuple(Oid relid); extern void ResetSequence(Oid seq_relid); +extern void do_setval(Oid relid, int64 next, bool iscalled, int64 logcnt); extern void ResetSequenceCaches(void); do_setval() was an OK function name when it was static, but as an exposed API it seems like a terrible name. IMO rename it to something like 'SetSequence' to match the other API functions nearby. ~ nitpick - same change to the parameter names as suggested for the implementation. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 22b2a93..16c427e 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8110,9 +8110,10 @@
Re: Pgoutput not capturing the generated columns
Thanks for the patch updates. Here are my review comments for v21-0001. I think this patch is mostly OK now except there are still some comments about the TAP test. == Commit Message 0. Using Create Subscription: CREATE SUBSCRIPTION sub2_gen_to_gen CONNECTION '$publisher_connstr' PUBLICATION pub1 WITH (include_generated_columns = true, copy_data = false)" If you are going to give an example, I think a gen-to-nogen example would be a better choice. That's because the gen-to-gen (as you have here) is not going to replicate anything due to the subscriber-side column being generated. == src/test/subscription/t/011_generated.pl 1. +$node_subscriber2->safe_psql('postgres', + "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 22) STORED, c int)" +); The subscriber2 node was intended only for all the tables where we need include_generated_columns to be true. Mostly that is the combination tests. (tab_gen_to_nogen, tab_nogen_to_gen, etc) OTOH, table 'tab1' already existed. I don't think we need to bother subscribing to tab1 from subscriber2 because every combination is already covered by the combination tests. Let's leave this one alone. ~~~ 2. Huh? Where is the "tab_nogen_to_gen" combination test that I sent to you off-list? ~~~ 3. +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab_order (c int GENERATED ALWAYS AS (a * 22) STORED, a int, b int)" +); Maybe you can test 'tab_order' on both subscription nodes but I think it is overkill. IMO it is enough to test it on subscription2. ~~~ 4. +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab_alter (a int, b int, c int GENERATED ALWAYS AS (a * 22) STORED)" +); Ditto above. Maybe you can test 'tab_order' on both subscription nodes but I think it is overkill. IMO it is enough to test it on subscription2. ~~~ 5. Don't forget to add initial data for the missing nogen_to_gen table/test. ~~~ 6. $node_publisher->safe_psql('postgres', - "CREATE PUBLICATION pub1 FOR ALL TABLES"); + "CREATE PUBLICATION pub1 FOR TABLE tab1, tab_gen_to_gen, tab_gen_to_nogen, tab_gen_to_missing, tab_missing_to_gen, tab_order"); + $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1" ); It is not a bad idea to reduce the number of publications as you have done, but IMO jamming all the tables into 1 publication is too much because it makes it less understandable instead of simpler. How about this: - leave the 'pub1' just for 'tab1'. - have a 'pub_combo' for publication all the gen_to_nogen, nogen_to_gen etc combination tests. - and a 'pub_misc' for any other misc tables like tab_order. ~~~ 7. +# # Wait for initial sync of all subscriptions +# I think you should write a note here that you have deliberately set copy_data = false because COPY and include_generated_columns are not allowed at the same time for patch 0001. And that is why all expected results on subscriber2 will be empty. Also, say this limitation will be changed in patch 0002. ~~~ (I didn't yet check 011_generated.pl file results beyond this point... I'll wait for v22-0001 to review further) == Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication of sequences
Hi Vignesh, There are still pending changes from my previous review of the 0720-0003 patch [1], but here are some new review comments for your latest patch v20240525-0003. == doc/src/sgml/catalogs.sgml nitpick - fix plurals and tweak the description. ~~~ 1. - This catalog only contains tables known to the subscription after running - either CREATE SUBSCRIPTION or - ALTER SUBSCRIPTION ... REFRESH + This catalog only contains tables and sequences known to the subscription + after running either + CREATE SUBSCRIPTION + or ALTER SUBSCRIPTION ... REFRESH PUBLICATION. Shouldn't this mention "REFRESH PUBLICATION SEQUENCES" too? == src/backend/commands/sequence.c SetSequenceLastValue: nitpick - maybe change: /log_cnt/new_log_cnt/ for consistency with the other parameter, and to emphasise the old log_cnt is overwritten == src/backend/replication/logical/sequencesync.c 2. +/* + * fetch_remote_sequence_data + * + * Fetch sequence data (current state) from the remote node, including + * the latest sequence value from the publisher and the Page LSN for the + * sequence. + */ +static int64 +fetch_remote_sequence_data(WalReceiverConn *conn, Oid remoteid, +int64 *log_cnt, XLogRecPtr *lsn) 2a. Now you are also returning the 'log_cnt' but that is not mentioned by the function comment. ~ 2b. Is it better to name these returned by-ref ptrs like 'ret_log_cnt', and 'ret_lsn' to emphasise they are output variables? YMMV. ~~~ 3. + /* Process the sequence. */ + slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple); + while (tuplestore_gettupleslot(res->tuplestore, true, false, slot)) This will have one-and-only-one tuple for the discovered sequence, won't it? So, why is this a while loop? == src/include/commands/sequence.h nitpick - maybe change: /log_cnt/new_log_cnt/ (same as earlier in this post) == src/test/subscription/t/034_sequences.pl 4. Q. Should we be suspicious that log_cnt changes from '32' to '31', or is there a valid explanation? It smells like some calculation is off-by-one, but without debugging I can't tell if it is right or wrong. == Please also see the attached diffs patch, which implements the nitpicks mentioned above. == [1] 0720-0003 review - https://www.postgresql.org/message-id/CAHut%2BPsfsfzyBrmo8E43qFMp9_bmen2tuCsNYN8sX%3Dfa86SdfA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 19d04b1..dcd0b98 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8102,8 +8102,8 @@ SCRAM-SHA-256$<iteration count>:&l - The catalog pg_subscription_rel contains the - state for each replicated tables and sequences in each subscription. This + The catalog pg_subscription_rel stores the + state of each replicated table and sequence for each subscription. This is a many-to-many mapping. diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index a3e7c79..f292fbc 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -342,7 +342,7 @@ ResetSequence(Oid seq_relid) * logical replication. */ void -SetSequenceLastValue(Oid seq_relid, int64 new_last_value, int64 log_cnt) +SetSequenceLastValue(Oid seq_relid, int64 new_last_value, int64 new_log_cnt) { SeqTableelm; Relationseqrel; @@ -370,7 +370,7 @@ SetSequenceLastValue(Oid seq_relid, int64 new_last_value, int64 log_cnt) seq->last_value = new_last_value; seq->is_called = true; - seq->log_cnt = log_cnt; + seq->log_cnt = new_log_cnt; MarkBufferDirty(buf); diff --git a/src/include/commands/sequence.h b/src/include/commands/sequence.h index a302890..4c6aee0 100644 --- a/src/include/commands/sequence.h +++ b/src/include/commands/sequence.h @@ -60,7 +60,7 @@ extern ObjectAddress AlterSequence(ParseState *pstate, AlterSeqStmt *stmt); extern void SequenceChangePersistence(Oid relid, char newrelpersistence); extern void DeleteSequenceTuple(Oid relid); extern void ResetSequence(Oid seq_relid); -extern void SetSequenceLastValue(Oid seq_relid, int64 new_last_value, int64 log_cnt); +extern void SetSequenceLastValue(Oid seq_relid, int64 new_last_value, int64 new_log_cnt); extern void ResetSequenceCaches(void); extern void seq_redo(XLogReaderState *record);
Re: Logical Replication of sequences
Here are some review comments for latest patch v20240725-0002 == doc/src/sgml/ref/create_publication.sgml nitpick - tweak to the description of the example. == src/backend/parser/gram.y preprocess_pub_all_objtype_list: nitpick - typo "allbjects_list" nitpick - reword function header nitpick - /alltables/all_tables/ nitpick - /allsequences/all_sequences/ nitpick - I think code is safe as-is because makeNode internally does palloc0, but OTOH adding Assert would be nicer just to remove any doubts. == src/bin/psql/describe.c 1. + /* Print any publications */ + if (pset.sversion >= 18) + { + int tuples = 0; No need to assign value 0 here, because this will be unconditionally assigned before use anyway. 2. describePublications has_pubviaroot = (pset.sversion >= 13); + has_pubsequence = (pset.sversion >= 18000); That's a bug! Should be 18, not 18000. == And, please see the attached diffs patch, which implements the nitpicks mentioned above. ====== Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml index 7dcfe37..783874f 100644 --- a/doc/src/sgml/ref/create_publication.sgml +++ b/doc/src/sgml/ref/create_publication.sgml @@ -420,7 +420,7 @@ CREATE PUBLICATION users_filtered FOR TABLE users (user_id, firstname); - Create a publication that synchronizes all the sequences: + Create a publication that publishes all sequences for synchronization: CREATE PUBLICATION all_sequences FOR ALL SEQUENCES; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 585f61e..9b3cad1 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -215,9 +215,9 @@ static void processCASbits(int cas_bits, int location, const char *constrType, static PartitionStrategy parsePartitionStrategy(char *strategy); static void preprocess_pubobj_list(List *pubobjspec_list, core_yyscan_t yyscanner); -static void preprocess_pub_all_objtype_list(List *allbjects_list, - bool *alltables, - bool *allsequences, +static void preprocess_pub_all_objtype_list(List *all_objects_list, + bool *all_tables, + bool *all_sequences, core_yyscan_t yyscanner); static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); @@ -19440,39 +19440,42 @@ parsePartitionStrategy(char *strategy) } /* - * Process all_objects_list to check if the options have been specified more than - * once and set alltables/allsequences. + * Process all_objects_list to set all_tables/all_sequences. + * Also, checks if the pub_object_type has been specified more than once. */ static void -preprocess_pub_all_objtype_list(List *all_objects_list, bool *alltables, - bool *allsequences, core_yyscan_t yyscanner) +preprocess_pub_all_objtype_list(List *all_objects_list, bool *all_tables, + bool *all_sequences, core_yyscan_t yyscanner) { if (!all_objects_list) return; + Assert(all_tables && *all_tables == false); + Assert(all_sequences && *all_sequences == false); + foreach_ptr(PublicationAllObjSpec, obj, all_objects_list) { if (obj->pubobjtype == PUBLICATION_ALL_TABLES) { - if (*alltables) + if (*all_tables) ereport(ERROR, errcode(ERRCODE_SYNTAX_ERROR), errmsg("invalid publication object list"), errdetail("TABLES can be specified only once."), parser_errposition(obj->location)); - *alltables = true; + *all_tables = true; } else if (obj->pubobjtype == PUBLICATION_ALL_SEQUENCES) { - if (*allsequences) + if (*all_sequences) ereport(ERROR, errcode(ERRCODE_SYNTAX_ERROR), errmsg("invalid publication object list"),
Re: Logical Replication of sequences
; maths seems unnecessarily tricky. Can't we just increment the cur_seq? before this calculation? ~ nitpick - simplify the comment about batching nitpick - added a comment to the commit == src/backend/replication/logical/tablesync.c finish_sync_worker: nitpick - added an Assert so the if/else is less risky. nitpick - modify the comment about failure time when it is a clean exit ~~~ 15. process_syncing_sequences_for_apply + /* We need up-to-date sync state info for subscription sequences here. */ + FetchTableStates(&started_tx, SUB_REL_KIND_ALL); Should that say SUB_REL_KIND_SEQUENCE? ~ 16. + /* + * If there are free sync worker slot(s), start a new sequence + * sync worker, and break from the loop. + */ + if (nsyncworkers < max_sync_workers_per_subscription) Should this "if" have some "else" code to log a warning if we have run out of free workers? Otherwise, how will the user know that the system may need tuning? ~~~ 17. FetchTableStates /* Fetch all non-ready tables. */ - rstates = GetSubscriptionRelations(MySubscription->oid, true); + rstates = GetSubscriptionRelations(MySubscription->oid, rel_type, true); This feels risky. IMO there needs to be some prior Assert about the rel_type. For example, if it happened to be SUB_REL_KIND_SEQUENCE then this function code doesn't seem to make sense. ~~~ == src/backend/replication/logical/worker.c 18. SetupApplyOrSyncWorker + + if (isSequenceSyncWorker(MyLogicalRepWorker)) + before_shmem_exit(logicalrep_seqsyncworker_failuretime, (Datum) 0); Probably that should be using macro am_sequencesync_worker(), right? == src/include/catalog/pg_subscription_rel.h 19. +typedef enum +{ + SUB_REL_KIND_TABLE, + SUB_REL_KIND_SEQUENCE, + SUB_REL_KIND_ALL, +} SubscriptionRelKind; + I was not sure how helpful this is; it might not be needed. e.g. see review comment for GetSubscriptionRelations ~~~ 20. +extern List *GetSubscriptionRelations(Oid subid, SubscriptionRelKind reltype, + bool not_ready); There is a mismatch with the ‘not_ready’ parameter name here and in the function implementation == src/test/subscription/t/034_sequences.pl nitpick - removed a blank line == 99. Please also see the attached diffs patch which implements all the nitpicks mentioned above. == [1] syntax - https://www.postgresql.org/message-id/CAHut%2BPuFH1OCj-P1UKoRQE2X4-0zMG%2BN1V7jdn%3DtOQV4RNbAbw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c index 5610c07..04d322a 100644 --- a/src/backend/catalog/pg_subscription.c +++ b/src/backend/catalog/pg_subscription.c @@ -494,14 +494,18 @@ HasSubscriptionRelations(Oid subid) /* * Get the relations for the subscription. * - * If rel_type is SUB_REL_KIND_SEQUENCE, get only the sequences. If rel_type is - * SUB_REL_KIND_TABLE, get only the tables. If rel_type is SUB_REL_KIND_ALL, - * get both tables and sequences. + * rel_type: + * If SUB_REL_KIND_SEQUENCE, return only the sequences. + * If SUB_REL_KIND_TABLE, return only the tables. + * If SUB_REL_KIND_ALL, return both tables and sequences. + * + * not_all_relations: * If not_all_relations is true for SUB_REL_KIND_TABLE and SUB_REL_KIND_ALL, * return only the relations that are not in a ready state, otherwise return all * the relations of the subscription. If not_all_relations is true for * SUB_REL_KIND_SEQUENCE, return only the sequences that are in init state, * otherwise return all the sequences of the subscription. + * * The returned list is palloc'ed in the current memory context. */ List * diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index d23901a..2f9ff8b 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -879,11 +879,13 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, * Update the subscription to refresh both the publication and the publication * objects associated with the subscription. * - * If the copy_data parameter is true, the function will set the state - * to "init"; otherwise, it will set the state to "ready". When the - * validate_publications is provided with a publication list, the function - * checks that the specified publications exist on the publisher. If - * refresh_all_sequences is true, it will mark all sequences with "init" state + * If 'copy_data' parameter is true, the function will set the state + * to "init"; otherwise, it will set the state to "ready". + * + * When 'validate_publications' is provided with a publication list, the function + * checks that the specified publications exist on the publisher. + * + * If 'refresh_all_sequences' is true, it will mark all sequences with "init" state * for re-synchronization; oth
Re: Logical Replication of sequences
Hi, here are some review comments for patch v20240720-0003. This review is a WIP. This post is only about the docs (*.sgml) of patch 0003. == doc/src/sgml/ref/alter_subscription.sgml 1. REFRESH PUBLICATION and copy_data nitpicks: - IMO the "synchronize the sequence data" info was misleading because synchronization should only occur when copy_data=true. - I also felt it was strange to mention pg_subscription_rel for sequences, but not for tables. I modified this part too. - Then I moved the information about re/synchronization of sequences into the "copy_data" part. - And added another link to ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES Anyway, in summary, I have updated this page quite a lot according to my understanding. Please take a look at the attached nitpick for my suggestions. nitpick - /The supported options are:/The only supported option is:/ ~~~ 2. REFRESH PUBLICATION SEQUENCES nitpick - tweaked the wording nitpicK - typo /syncronizes/synchronizes/ == 3. catalogs.sgml IMO something is missing in Section "1.55. pg_subscription_rel". Currently, this page only talks of relations/tables, but I think it should mention "sequences" here too, particularly since now we are linking to here from ALTER SUBSCRIPTION when talking about sequences. == 99. Please see the attached diffs patch which implements any nitpicks mentioned above. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index ba8c2b1..666d9b0 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -153,7 +153,7 @@ ALTER SUBSCRIPTION name RENAME TO < REFRESH PUBLICATION - Fetch missing table information from publisher. This will start + Fetch missing table information from the publisher. This will start replication of tables that were added to the subscribed-to publications since CREATE SUBSCRIPTION or @@ -161,29 +161,26 @@ ALTER SUBSCRIPTION name RENAME TO < - It also fetches the missing sequence information from the publisher and - synchronize the sequence data for newly added sequences with the - publisher. This will start synchronizing of sequences that were added to - the subscribed-to publications since - CREATE SUBSCRIPTION or the last invocation of - REFRESH PUBLICATION. Additionally, it will remove any - sequences that are no longer part of the publication from the - pg_subscription_rel - system catalog. Sequences that have already been synchronized will not be - re-synchronized. + Also, fetch missing sequence information from the publisher. + + + + The system catalog pg_subscription_rel + is updated to record all tables and sequences known to the subscription, + that are still part of the publication. refresh_option specifies additional options for the - refresh operation. The supported options are: + refresh operation. The only supported option is: copy_data (boolean) - Specifies whether to copy pre-existing data for tables and sequences - in the publications that are being subscribed to when the replication + Specifies whether to copy pre-existing data for tables and synchronize + sequences in the publications that are being subscribed to when the replication starts. The default is true. @@ -191,6 +188,11 @@ ALTER SUBSCRIPTION name RENAME TO < filter WHERE clause has since been modified. + Previously subscribed sequences are not re-synchronized. To do that, + see + ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES + + See for details of how copy_data = true can interact with the origin @@ -212,11 +214,11 @@ ALTER SUBSCRIPTION name RENAME TO < REFRESH PUBLICATION SEQUENCES - Fetch missing sequences information from publisher and re-synchronize the + Fetch missing sequence information from the publisher, then re-synchronize sequence data with the publisher. Unlike ALTER SUBSCRIPTION ... REFRESH PUBLICATION which - only syncronizes the newly added sequences, this option will also - re-synchronize the sequence data for sequences that were previously added. + only synchronizes newly added sequences, REFRESH PUBLICATION SEQUENCES + will re-synchronize the sequence data for all subscribed sequences.
Re: Logical Replication of sequences
Here are some review comments for patch v20240720-0002. == 1. Commit message: 1a. The commit message is stale. It is still referring to functions and views that have been moved to patch 0003. 1b. "ALL SEQUENCES" is not a command. It is a clause of the CREATE PUBLICATION command. == doc/src/sgml/ref/create_publication.sgml nitpick - publication name in the example /allsequences/all_sequences/ == src/bin/psql/describe.c 2. describeOneTableDetails Although it's not the fault of this patch, this patch propagates the confusion of 'result' versus 'res'. Basically, I did not understand the need for the variable 'result'. There is already a "PGResult *res", and unless I am mistaken we can just keep re-using that instead of introducing a 2nd variable having almost the same name and purpose. ~ nitpick - comment case nitpick - rearrange comment == src/test/regress/expected/publication.out (see publication.sql) == src/test/regress/sql/publication.sql nitpick - tweak comment == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml index c9c1b92..7dcfe37 100644 --- a/doc/src/sgml/ref/create_publication.sgml +++ b/doc/src/sgml/ref/create_publication.sgml @@ -422,7 +422,7 @@ CREATE PUBLICATION users_filtered FOR TABLE users (user_id, firstname); Create a publication that synchronizes all the sequences: -CREATE PUBLICATION allsequences FOR ALL SEQUENCES; +CREATE PUBLICATION all_sequences FOR ALL SEQUENCES; diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 0f3f86b..a92af54 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -1851,7 +1851,7 @@ describeOneTableDetails(const char *schemaname, } PQclear(result); - /* print any publications */ + /* Print any publications */ if (pset.sversion >= 18) { int tuples = 0; @@ -1867,8 +1867,8 @@ describeOneTableDetails(const char *schemaname, if (!result) goto error_return; - tuples = PQntuples(result); /* Might be an empty set - that's ok */ + tuples = PQntuples(result); if (tuples > 0) { printTableAddFooter(&cont, _("Publications:")); diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out index 3ea2224..6c573a1 100644 --- a/src/test/regress/expected/publication.out +++ b/src/test/regress/expected/publication.out @@ -259,7 +259,7 @@ Publications: SET client_min_messages = 'ERROR'; CREATE PUBLICATION regress_pub_forallsequences2 FOR ALL SEQUENCES; RESET client_min_messages; --- Check describe sequence lists both the publications +-- check that describe sequence lists all publications the sequence belongs to \d+ pub_test.regress_pub_seq1 Sequence "pub_test.regress_pub_seq1" Type | Start | Minimum | Maximum | Increment | Cycles? | Cache diff --git a/src/test/regress/sql/publication.sql b/src/test/regress/sql/publication.sql index 8d553ed..ac77fe4 100644 --- a/src/test/regress/sql/publication.sql +++ b/src/test/regress/sql/publication.sql @@ -134,7 +134,7 @@ SET client_min_messages = 'ERROR'; CREATE PUBLICATION regress_pub_forallsequences2 FOR ALL SEQUENCES; RESET client_min_messages; --- Check describe sequence lists both the publications +-- check that describe sequence lists all publications the sequence belongs to \d+ pub_test.regress_pub_seq1 --- FOR ALL specifying both TABLES and SEQUENCES
Re: Pgoutput not capturing the generated columns
On Fri, Jul 19, 2024 at 4:01 PM Shlok Kyal wrote: > > On Thu, 18 Jul 2024 at 13:55, Peter Smith wrote: > > > > Hi, here are some review comments for v19-0002 > > == > > src/test/subscription/t/004_sync.pl > > > > 1. > > This new test is not related to generated columns. IIRC, this is just > > some test that we discovered missing during review of this thread. As > > such, I think this change can be posted/patched separately from this > > thread. > > > I have removed the test for this thread. > > I have also addressed the remaining comments for v19-0002 patch. Hi, I have no more review comments for patch v20-0002 at this time. I saw that the above test was removed from this thread as suggested, but I could not find that any new thread was started to propose this valuable missing test. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Patch v22-0001 LGTM apart from the following nitpicks == src/sgml/ref/alter_subscription.sgml nitpick - /one needs to/you need to/ == src/backend/commands/subscriptioncmds.c CheckAlterSubOption: nitpick = "ideally we could have..." doesn't make sense because the code uses a more consistent/simpler way. So other option was not ideal after all. AlterSubscription nitpick - typo /syncronization/synchronization/ nipick - plural fix == Kind Regards, Peter Smith. Fujitsu Australia. diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index cbba1ee..6af6d0d 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -272,7 +272,7 @@ ALTER SUBSCRIPTION name RENAME TO < logical replication worker corresponding to a particular subscription have the following pattern: pg_gid_%u_%u (parameters: subscription oid, remote transaction id xid). - To resolve such transactions manually, one needs to roll back all + To resolve such transactions manually, you need to roll back all the prepared transactions with corresponding subscription IDs in their names. Applications can check pg_prepared_xacts diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 27560f1..b21f5c0 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -1090,9 +1090,9 @@ CheckAlterSubOption(Subscription *sub, const char *option, * publisher cannot be modified if the slot is currently acquired by the * existing walsender. * -* Note that the two_phase is enabled (aka changed from 'false' to 'true') -* on the publisher by the existing walsender so, ideally, we can allow -* that even when a subscription is enabled. But we kept this restriction +* Note that two_phase is enabled (aka changed from 'false' to 'true') +* on the publisher by the existing walsender, so we could have allowed +* that even when the subscription is enabled. But we kept this restriction * for the sake of consistency and simplicity. */ if (sub->enabled) @@ -1281,7 +1281,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, * We need to update both the slot and the subscription * for two_phase option. We can enable the two_phase * option for a slot only once the initial data -* syncronization is done. This is to avoid missing some +* synchronization is done. This is to avoid missing some * data as explained in comments atop worker.c. */ update_two_phase = !opts.twophase; @@ -1306,9 +1306,9 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, * * Ensure workers have already been exited to avoid * getting prepared transactions while we are disabling -* two_phase option. Otherwise, the changes of already -* prepared transactions can be replicated again along -* with its corresponding commit leading to duplicate data +* two_phase option. Otherwise, the changes of an already +* prepared transaction can be replicated again along +* with its corresponding commit, leading to duplicate data * or errors. */ if (logicalrep_workers_find(subid, true, true))
Re: walsender.c fileheader comment
Hi, Thankyou for taking the time to look at this and reply. > > I did look at this, and while the explanation in the current comment may > seem a bit confusing, I'm not sure the suggested changes improve the > situation very much. > > This suggests the two comments somehow disagree, but it does not say in > what exactly, so perhaps I just missed it :-( > > ISTM there's a bit of confusion what is meant by "stopping" state - you > seem to be interpreting it as a general concept, where the walsender is > requested to stop (through the signal), and starts doing stuff to exit. > But the comments actually talk about WalSnd->state, where "stopping" > means it needs to be set to WALSNDSTATE_STOPPING. Yes, I interpreted the "stopping" state meaning as when the boolean flag 'got_STOPPING' is assigned true. > > And we only ever switch to that state in two places - in WalSndPhysical > and exec_replication_command. And that does not happen in regular > logical replication (which is what "logical replication is in progress" > refers to) - if you have a walsender just replicating DML, it will never > see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while > still in WALSNDSTATE_STREAMING state, and then just exit. > > So from this point of view, the suggestion is actually wrong. OK. > > To conclude, I think this probably makes the comments more confusing. If > we want to make it clearer, I'd probably start by clarifying what the > "stopping" state means. Also, it's a bit surprising we may not actually > go through the "stopping" state during shutdown. > I agree. My interpretation of the (ambiguous) "stopping" state led me to believe the comment was quite wrong. So, this thread was only intended as a trivial comment fix in passing but clearly there is more to this than I anticipated. I would be happy if someone with more knowledge about the WALSNDSTATE_STOPPING versus got_STOPPING could disambiguate the file header comment, but that's not me, so I have withdrawn this from the Commitfest. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Thu, Jul 18, 2024 at 9:42 PM Amit Kapila wrote: > ... > I agree and have done that in the attached. I have made some > additional changes: (a) removed the unrelated change of two_phase in > protocol.sgml, (b) tried to make the two_phase change before failover > option wherever it makes sense to keep the code consistent, (c) > changed/added comments and doc changes at various places. > > I'll continue my review and testing of the patch but I thought of > sharing what I have done till now. > Here some minor comments for patch v21 == You wrote "tried to make the two_phase change before failover option wherever it makes sense to keep the code consistent". But, still failover is coded first in lots of places: - libpqrcv_alter_slot - ReplicationSlotAlter - AlterReplicationSlot etc. Q. Why not change those ones? == src/backend/access/transam/twophase.c IsTwoPhaseTransactionGidForSubid: nitpick - nicer to rename the temporary gid variable: /gid_generated/gid_tmp/ == src/backend/commands/subscriptioncmds.c CheckAlterSubOption: nitpick = function comment period/plural. nitpick - typo /Samilar/Similar/ == src/include/replication/slot.h 1. -extern void ReplicationSlotAlter(const char *name, bool failover); +extern void ReplicationSlotAlter(const char *name, bool *failover, + bool *two_phase); Use const? == 99. Please see attached diffs implementing the nitpicks mentioned above == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 90df32f..da97836 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -2712,7 +2712,7 @@ IsTwoPhaseTransactionGidForSubid(Oid subid, char *gid) int ret; Oid subid_from_gid; TransactionId xid_from_gid; - chargid_generated[GIDSIZE]; + chargid_tmp[GIDSIZE]; /* Extract the subid and xid from the given GID */ ret = sscanf(gid, "pg_gid_%u_%u", &subid_from_gid, &xid_from_gid); @@ -2729,10 +2729,9 @@ IsTwoPhaseTransactionGidForSubid(Oid subid, char *gid) * the given GID and check whether the temporary GID and the given GID * match. */ - TwoPhaseTransactionGid(subid, xid_from_gid, gid_generated, - sizeof(gid_generated)); + TwoPhaseTransactionGid(subid, xid_from_gid, gid_tmp, sizeof(gid_tmp)); - return strcmp(gid, gid_generated) == 0; + return strcmp(gid, gid_tmp) == 0; } /* diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 2e9f329..5f11235 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -1071,7 +1071,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data, } /* - * Common checks for altering failover and two_phase option + * Common checks for altering failover and two_phase options. */ static void CheckAlterSubOption(Subscription *sub, const char *option, @@ -1337,8 +1337,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, if (IsSet(opts.specified_opts, SUBOPT_FAILOVER)) { /* -* Samilar to two_phase, we need to update the failover -* option for the slot and the subscription. +* Similar to the two_phase case above, we need to update +* the failover option for the slot and the subscription. */ update_failover = true;
Re: Pgoutput not capturing the generated columns
Hi, here are some review comments for patch v19-0003 == src/backend/catalog/pg_publication.c 1. /* * Translate a list of column names to an array of attribute numbers * and a Bitmapset with them; verify that each attribute is appropriate * to have in a publication column list (no system or generated attributes, * no duplicates). Additional checks with replica identity are done later; * see pub_collist_contains_invalid_column. * * Note that the attribute numbers are *not* offset by * FirstLowInvalidHeapAttributeNumber; system columns are forbidden so this * is okay. */ static void publication_translate_columns(Relation targetrel, List *columns, int *natts, AttrNumber **attrs) ~ I though the above comment ought to change: /or generated attributes/or virtual generated attributes/ IIRC this was already addressed back in v16, but somehow that fix has been lost (???). == src/backend/replication/logical/tablesync.c fetch_remote_table_info: nitpick - missing end space in this comment /* TODO: use ATTRIBUTE_GENERATED_VIRTUAL*/ == 2. (in patch v19-0001) +# tab3: +# publisher-side tab3 has generated col 'b'. +# subscriber-side tab3 has generated col 'b', using a different computation. (here, in patch v19-0003) # tab3: -# publisher-side tab3 has generated col 'b'. -# subscriber-side tab3 has generated col 'b', using a different computation. +# publisher-side tab3 has stored generated col 'b' but +# subscriber-side tab3 has DIFFERENT COMPUTATION stored generated col 'b'. It has become difficult to review these TAP tests, particularly when different patches are modifying the same comment. e.g. I post suggestions to modify comments for patch 0001. Those get addressed OK, only to vanish in subsequent patches like has happened in the above example. Really this patch 0003 was only supposed to add the word "stored", not revert the entire comment to something from an earlier version. Please take care that all comment changes are carried forward correctly from one patch to the next. == Kind Regards, Peter Smith. Fujitsu Australia.
Re: Pgoutput not capturing the generated columns
Hi, here are some review comments for v19-0002 == src/backend/replication/logical/tablesync.c make_copy_attnamelist: nitpick - tweak function comment nitpick - tweak other comments ~~~ fetch_remote_table_info: nitpick - add space after "if" nitpick - removed a comment because logic is self-evident from the variable name == src/test/subscription/t/004_sync.pl 1. This new test is not related to generated columns. IIRC, this is just some test that we discovered missing during review of this thread. As such, I think this change can be posted/patched separately from this thread. == src/test/subscription/t/011_generated.pl nitpick - change some comment wording to be more consistent with patch 0001. == 99. Please see the nitpicks diff attachment which implements any nitpicks mentioned above. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 935be7f..2e90d42 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -693,8 +693,8 @@ process_syncing_tables(XLogRecPtr current_lsn) } /* - * Create list of columns for COPY based on logical relation mapping. Do not - * include generated columns of the subscription table in the column list. + * Create list of columns for COPY based on logical relation mapping. + * Exclude columns that are subscription table generated columns. */ static List * make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool *remotegenlist) @@ -707,7 +707,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool *remotegenlist) localgenlist = palloc0(rel->remoterel.natts * sizeof(bool)); /* -* This loop checks for generated columns on subscription table. +* This loop checks for generated columns of the subscription table. */ for (int i = 0; i < desc->natts; i++) { @@ -1010,15 +1010,12 @@ fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist_res, " WHERE a.attnum > 0::pg_catalog.int2" " AND NOT a.attisdropped", lrel->remoteid); - if(server_version >= 12) + if (server_version >= 12) { bool gencols_allowed = server_version >= 18 && MySubscription->includegencols; - if(!gencols_allowed) - { - /* Replication of generated cols is not supported. */ + if (!gencols_allowed) appendStringInfo(&cmd, " AND a.attgenerated = ''"); - } } appendStringInfo(&cmd, diff --git a/src/test/subscription/t/011_generated.pl b/src/test/subscription/t/011_generated.pl index 1814628..4537c6c 100644 --- a/src/test/subscription/t/011_generated.pl +++ b/src/test/subscription/t/011_generated.pl @@ -46,9 +46,9 @@ $node_subscriber->safe_psql('postgres', "CREATE TABLE tab3 (a int, b int GENERATED ALWAYS AS (a + 20) STORED)"); # tab4: -# publisher-side tab4 has generated cols 'b' and 'c' but -# subscriber-side tab4 has non-generated col 'b', and generated-col 'c' -# where columns on publisher/subscriber are in a different order +# Publisher-side tab4 has generated cols 'b' and 'c'. +# Subscriber-side tab4 has non-generated col 'b', and generated-col 'c'. +# Columns on publisher/subscriber are in a different order. $node_publisher->safe_psql('postgres', "CREATE TABLE tab4 (a int, b int GENERATED ALWAYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (a * 2) STORED)" ); @@ -57,14 +57,14 @@ $node_subscriber->safe_psql('postgres', ); # tab5: -# publisher-side tab5 has non-generated col 'b' but -# subscriber-side tab5 has generated col 'b' +# Publisher-side tab5 has non-generated col 'b'. +# Subscriber-side tab5 has generated col 'b'. $node_publisher->safe_psql('postgres', "CREATE TABLE tab5 (a int, b int)"); $node_subscriber->safe_psql('postgres', "CREATE TABLE tab5 (a int, b int GENERATED ALWAYS AS (a * 22) STORED)"); # tab6: -# tables for testing ALTER SUBSCRIPTION ... REFRESH PUBLICATION +# Tables for testing ALTER SUBSCRIPTION ... REFRESH PUBLICATION $node_publisher->safe_psql('postgres', "CREATE TABLE tab6 (a int, b int GENERATED ALWAYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (a * 2) STORED)" ); @@ -73,8 +73,8 @@ $node_subscriber->safe_psql('postgres', ); # tab7: -# publisher-side tab7 has generated col 'b' but -# subscriber-side tab7 do not have col 'b' +# Publisher-side tab7
Re: Pgoutput not capturing the generated columns
Hi Shubham, here are my review comments for patch v19-0001. == src/backend/replication/pgoutput/pgoutput.c 1. /* * Columns included in the publication, or NULL if all columns are * included implicitly. Note that the attnums in this bitmap are not + * publication and include_generated_columns option: other reasons should + * be checked at user side. Note that the attnums in this bitmap are not * shifted by FirstLowInvalidHeapAttributeNumber. */ Bitmapset *columns; You replied [1] "The attached Patches contain all the suggested changes." but as I previously commented [2, #1], since there is no change to the interpretation of the 'columns' BMS caused by this patch, then I expected this comment would be unchanged (i.e. same as HEAD code). But this fix was missed in v19-0001. OTOH, if you do think there was a reason to change the comment then the above is still not good because "are not publication and include_generated_columns option" wording doesn't make sense. == src/test/subscription/t/011_generated.pl Observation -- I added (in nitpicks diffs) some more comments for 'tab1' (to make all comments consistent with the new tests added). But when I was doing that I observed that tab1 and tab3 test scenarios are very similar. It seems only the subscription parameter is not specified (so 'include_generated_cols' default wll be tested). IIRC the default for that parameter is "false", so tab1 is not really testing that properly -- e.g. I thought maybe to test the default parameter it's better the subscriber-side 'b' should be not-generated? But doing that would make 'tab1' the same as 'tab2'. Anyway, something seems amiss -- it seems either something is not tested or is duplicate tested. Please revisit what the tab1 test intention was and make sure we are doing the right thing for it... == 99. The attached nitpicks diff patch has some tweaked comments. == [1] https://www.postgresql.org/message-id/CAHv8Rj%2BR0cj%3Dz1bTMAgQKQWx1EKvkMEnV9QsHGvOqTdnLUQi1A%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAHut%2BPtVfrbx0jb42LCmS%3D-LcMTtWxm%2BvhaoArkjg7Z0mvuXbg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia. diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h index 50c5911..e066426 100644 --- a/src/include/catalog/pg_subscription.h +++ b/src/include/catalog/pg_subscription.h @@ -98,8 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW * slots) in the upstream database are enabled * to be synchronized to the standbys. */ - boolsubincludegencols; /* True if generated columns must be -* published */ + boolsubincludegencols; /* True if generated columns should +* be published */ #ifdef CATALOG_VARLEN /* variable-length fields start here */ /* Connection string to the publisher */ diff --git a/src/test/subscription/t/011_generated.pl b/src/test/subscription/t/011_generated.pl index fe32987..d13d0a0 100644 --- a/src/test/subscription/t/011_generated.pl +++ b/src/test/subscription/t/011_generated.pl @@ -20,24 +20,28 @@ $node_subscriber->start; my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres'; +# +# tab1: +# Publisher-side tab1 has generated col 'b'. +# Subscriber-side tab1 has generated col 'b', using a different computation, +# and also an additional column 'c'. $node_publisher->safe_psql('postgres', "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED)" ); - $node_subscriber->safe_psql('postgres', "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 22) STORED, c int)" ); # tab2: -# publisher-side tab2 has generated col 'b'. -# subscriber-side tab2 has non-generated col 'b'. +# Publisher-side tab2 has generated col 'b'. +# Subscriber-side tab2 has non-generated col 'b'. $node_publisher->safe_psql('postgres', "CREATE TABLE tab2 (a int, b int GENERATED ALWAYS AS (a * 2) STORED)"); $node_subscriber->safe_psql('postgres', "CREATE TABLE tab2 (a int, b int)"); # tab3: -# publisher-side tab3 has generated col 'b'. -# subscriber-side tab3 has generated col 'b', using a different computation. +# Publisher-side tab3 has generated col 'b'. +# Subscriber-side tab3 has generated col 'b', using a different computation. $
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Kuroda-San, here are some review comment for patch v19-1 == doc/src/sgml/ref/alter_subscription.sgml The previous patches have common failover/two_phase code checking for "Do not allow changing the option if the subscription is enabled", but it seems the docs were mentioning that only for "two_phase" and not for "failover". I'm not 100% sure if mentioning about disabled was necessary, but certainly it should be all-or-nothing, not just saying it for one of the parameters. Anyway, I chose to add the missing info. Please see the attached nitpicks diff. == Kind Regards, Peter Smith. Fujitsu Australia. diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index 58db97f..e0ce83a 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -256,10 +256,15 @@ ALTER SUBSCRIPTION name RENAME TO < - The two_phase - parameter can only be altered when the subscription is disabled. - When altering the parameter from true - to false, the backend process checks for any incomplete + The failover + and two_phase + parameters can only be altered when the subscription is disabled. + + + + When altering two_phase + from true to false, + the backend process checks for any incomplete prepared transactions done by the logical replication worker (from when two_phase parameter was still true) and, if any are found, an error is reported. If this happens, you can