Re: Add contrib/pg_logicalsnapinspect

2024-09-22 Thread Peter Smith
My review comments for v8-0001

==
contrib/pg_logicalinspect/pg_logicalinspect.c

1.
+/*
+ * Lookup table for SnapBuildState.
+ */
+
+#define SNAPBUILD_STATE_INCR 1
+
+static const char *const SnapBuildStateDesc[] = {
+ [SNAPBUILD_START + SNAPBUILD_STATE_INCR] = "start",
+ [SNAPBUILD_BUILDING_SNAPSHOT + SNAPBUILD_STATE_INCR] = "building",
+ [SNAPBUILD_FULL_SNAPSHOT + SNAPBUILD_STATE_INCR] = "full",
+ [SNAPBUILD_CONSISTENT + SNAPBUILD_STATE_INCR] = "consistent",
+};
+
+/*

nit - the SNAPBUILD_STATE_INCR made this code appear more complicated
than it is. Please take a look at the attachment for an alternative
implementation which includes an explanatory comment. YMMV. Feel free
to ignore it.

==
src/include/replication/snapbuild.h

2.
+ * Please keep SnapBuildStateDesc[] (located in the pg_logicalinspect module)
+ * updated should a change needs to be done in SnapBuildState.

nit - "...should a change needs to be done" -- the word "needs" is
incorrect here.

How about:
"...if a change needs to be made to SnapBuildState."
"...if a change is made to SnapBuildState."
"...if SnapBuildState is changed."

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/contrib/pg_logicalinspect/pg_logicalinspect.c 
b/contrib/pg_logicalinspect/pg_logicalinspect.c
index 100b82f..2419df1 100644
--- a/contrib/pg_logicalinspect/pg_logicalinspect.c
+++ b/contrib/pg_logicalinspect/pg_logicalinspect.c
@@ -24,19 +24,6 @@ PG_FUNCTION_INFO_V1(pg_get_logical_snapshot_meta);
 PG_FUNCTION_INFO_V1(pg_get_logical_snapshot_info);
 
 /*
- * Lookup table for SnapBuildState.
- */
-
-#define SNAPBUILD_STATE_INCR 1
-
-static const char *const SnapBuildStateDesc[] = {
-   [SNAPBUILD_START + SNAPBUILD_STATE_INCR] = "start",
-   [SNAPBUILD_BUILDING_SNAPSHOT + SNAPBUILD_STATE_INCR] = "building",
-   [SNAPBUILD_FULL_SNAPSHOT + SNAPBUILD_STATE_INCR] = "full",
-   [SNAPBUILD_CONSISTENT + SNAPBUILD_STATE_INCR] = "consistent",
-};
-
-/*
  * NOTE: For any code change or issue fix here, it is highly recommended to
  * give a thought about doing the same in SnapBuildRestore() as well.
  */
@@ -186,6 +173,16 @@ Datum
 pg_get_logical_snapshot_info(PG_FUNCTION_ARGS)
 {
 #define PG_GET_LOGICAL_SNAPSHOT_INFO_COLS 14
+   /*
+* Lookup table for SnapBuildState. The lookup index is offset by 1
+* because the consecutive SnapBuildState enum values start at -1.
+*/
+   static const char *const SnapBuildStateDesc[] = {
+   [1 + SNAPBUILD_START] = "start",
+   [1 + SNAPBUILD_BUILDING_SNAPSHOT] = "building",
+   [1 + SNAPBUILD_FULL_SNAPSHOT] = "full",
+   [1 + SNAPBUILD_CONSISTENT] = "consistent",
+   };
SnapBuildOnDisk ondisk;
XLogRecPtr  lsn;
HeapTuple   tuple;
@@ -209,8 +206,7 @@ pg_get_logical_snapshot_info(PG_FUNCTION_ARGS)
 
memset(nulls, 0, sizeof(nulls));
 
-   values[i++] = 
CStringGetTextDatum(SnapBuildStateDesc[ondisk.builder.state +
-   
 SNAPBUILD_STATE_INCR]);
+   values[i++] = 
CStringGetTextDatum(SnapBuildStateDesc[ondisk.builder.state + 1]);
values[i++] = TransactionIdGetDatum(ondisk.builder.xmin);
values[i++] = TransactionIdGetDatum(ondisk.builder.xmax);
values[i++] = LSNGetDatum(ondisk.builder.start_decoding_at);
diff --git a/src/include/replication/snapbuild.h 
b/src/include/replication/snapbuild.h
index 78df2d1..e844a89 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -17,7 +17,7 @@
 
 /*
  * Please keep SnapBuildStateDesc[] (located in the pg_logicalinspect module)
- * updated should a change needs to be done in SnapBuildState.
+ * updated if a change needs to be made to SnapBuildState.
  */
 typedef enum
 {


Re: Pgoutput not capturing the generated columns

2024-09-22 Thread Peter Smith
On Fri, Sep 20, 2024 at 2:26 PM Amit Kapila  wrote:
>
> On Fri, Sep 20, 2024 at 4:16 AM Peter Smith  wrote:
> >
> > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > Users can use a publication like "create publication pub1 for table
> > > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > > but not for t2. They can specify the generated column name in the
> > > > column list of t1 in that case even though the rest of the tables
> > > > won't publish generated columns.
> > >
> > > Agreed.
> > >
> > > I think that users can use the publish_generated_column option when
> > > they want to publish all generated columns, instead of specifying all
> > > the columns in the column list. It's another advantage of this option
> > > that it will also include the future generated columns.
> > >
> >
> > OK. Let me give some examples below to help understand this idea.
> >
> > Please correct me if these are incorrect.
> >
> > Examples, when publish_generated_columns=true:
> >
> > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > (publish_generated_columns=true)
> > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > t2 -> publishes c, d + ALSO gen1, gen2
> >
> > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH 
> > (publish_generated_columns=true)
> > t1 -> publishes a, b + ALSO gen1, gen2
> > t2 -> publishes gen1 (e.g. what column list says)
> >
>
> These two could be controversial because one could expect that if
> "publish_generated_columns=true" then publish generated columns
> irrespective of whether they are mentioned in column_list. I am of the
> opinion that column_list should take priority the results should be as
> mentioned by you but let us see if anyone thinks otherwise.
>
> >
> > ==
> >
> > The idea LGTM, although now the parameter name
> > ('publish_generated_columns') seems a bit misleading since sometimes
> > generated columns get published "irrespective of the option".
> >
> > So, I think the original parameter name 'include_generated_columns'
> > might be better here because IMO "include" seems more like "add them
> > if they are not already specified", which is exactly what this idea is
> > doing.
> >
>
> I still prefer 'publish_generated_columns' because it matches with
> other publication option names. One can also deduce from
> 'include_generated_columns' that add all the generated columns even
> when some of them are specified in column_list.
>

Fair point. Anyway, to avoid surprises it will be important for the
precedence rules to be documented clearly (probably with some
examples),

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-09-19 Thread Peter Smith
On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada  wrote:
>
> On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila  wrote:
> >
...
> > I think that the column list should take priority and we should
> > publish the generated column if it is mentioned in  irrespective of
> > the option.
>
> Agreed.
>
> >
...
> >
> > Users can use a publication like "create publication pub1 for table
> > t1(c1, c2), t2;" where they want t1's generated column to be published
> > but not for t2. They can specify the generated column name in the
> > column list of t1 in that case even though the rest of the tables
> > won't publish generated columns.
>
> Agreed.
>
> I think that users can use the publish_generated_column option when
> they want to publish all generated columns, instead of specifying all
> the columns in the column list. It's another advantage of this option
> that it will also include the future generated columns.
>

OK. Let me give some examples below to help understand this idea.

Please correct me if these are incorrect.

==

Assuming these tables:

t1(a,b,gen1,gen2)
t2(c,d,gen1,gen2)

Examples, when publish_generated_columns=false:

CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
(publish_generated_columns=false)
t1 -> publishes a, b, gen2 (e.g. what column list says)
t2 -> publishes c, d

CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes gen1 (e.g. what column list says)

CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes c, d

CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes c, d

~~

Examples, when publish_generated_columns=true:

CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
(publish_generated_columns=true)
t1 -> publishes a, b, gen2 (e.g. what column list says)
t2 -> publishes c, d + ALSO gen1, gen2

CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes gen1 (e.g. what column list says)

CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes c, d + ALSO gen1, gen2

CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes c, d + ALSO gen1, gen2

==

The idea LGTM, although now the parameter name
('publish_generated_columns') seems a bit misleading since sometimes
generated columns get published "irrespective of the option".

So, I think the original parameter name 'include_generated_columns'
might be better here because IMO "include" seems more like "add them
if they are not already specified", which is exactly what this idea is
doing.

Thoughts?

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Add contrib/pg_logicalsnapinspect

2024-09-18 Thread Peter Smith
Thanks for the updated patch.

Here are a few more trivial comments for the patch v7-0001.

==

1.
Should the extension descriptions all be identical?

I noticed small variations:

contrib/pg_logicalinspect/Makefile
+PGFILEDESC = "pg_logicalinspect - functions to inspect logical
decoding components"

contrib/pg_logicalinspect/meson.build
+'--FILEDESC', 'pg_logicalinspect - functions to inspect contents
of logical snapshots',])

contrib/pg_logicalinspect/pg_logicalinspect.control
+comment = 'functions to inspect logical decoding components'

==
.../expected/logical_inspect.out

2
+step s1_get_logical_snapshot_info: SELECT
(pg_get_logical_snapshot_info(f.name::pg_lsn)).state,(pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_xip,1),(pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_xip,1)
FROM (SELECT replace(replace(name,'.snap',''),'-','/') AS name FROM
pg_ls_logicalsnapdir()) AS f ORDER BY 2;
+state|catchange_count|array_length|committed_count|array_length
+-+---++---+
+2|  0||  2|   2
+2|  2|   2|  0|
+(2 rows)
+

2a.
Would it be better to rearrange those columns so 'committed' stuff
comes before 'catchange' stuff, to make this table order consistent
with the structure/code?

~

2b.
Maybe those 2 'array_length' columns could have aliases to uniquely
identify them?
e.g. 'catchange_array_length' and 'committed_array_length'.

==
contrib/pg_logicalinspect/pg_logicalinspect.c

3.
+/*
+ * Validate the logical snapshot file.
+ */
+static void
+ValidateAndRestoreSnapshotFile(XLogRecPtr lsn, SnapBuildOnDisk *ondisk,
+const char *path)

Since the name was updated then should the function comment also be
updated to include something like the SnapBuildRestoreContents
function comment? e.g. "Validate the logical snapshot file, and read
the contents of the serialized snapshot to 'ondisk'."

~~~

pg_get_logical_snapshot_info:

4.
nit - Add/remove some blank lines to help visually associate the array
counts with their arrays.

==
.../specs/logical_inspect.spec

5.
+setup
+{
+DROP TABLE IF EXISTS tbl1;
+CREATE TABLE tbl1 (val1 integer, val2 integer);
+ CREATE EXTENSION pg_logicalinspect;
+}
+
+teardown
+{
+DROP TABLE tbl1;
+SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+ DROP EXTENSION pg_logicalinspect;
+}

Different indentation for the CREATE/DROP EXTENSION?

==

The attached file shows the whitespace nit (#4)

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/contrib/pg_logicalinspect/pg_logicalinspect.c 
b/contrib/pg_logicalinspect/pg_logicalinspect.c
index 185f36a..308c653 100644
--- a/contrib/pg_logicalinspect/pg_logicalinspect.c
+++ b/contrib/pg_logicalinspect/pg_logicalinspect.c
@@ -205,8 +205,8 @@ pg_get_logical_snapshot_info(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(ondisk.builder.in_slot_creation);
values[i++] = LSNGetDatum(ondisk.builder.last_serialized_snapshot);
values[i++] = TransactionIdGetDatum(ondisk.builder.next_phase_at);
-   values[i++] = Int64GetDatum(ondisk.builder.committed.xcnt);
 
+   values[i++] = Int64GetDatum(ondisk.builder.committed.xcnt);
if (ondisk.builder.committed.xcnt > 0)
{
Datum  *arrayelems;
@@ -223,7 +223,6 @@ pg_get_logical_snapshot_info(PG_FUNCTION_ARGS)
nulls[i++] = true;
 
values[i++] = Int64GetDatum(ondisk.builder.catchange.xcnt);
-
if (ondisk.builder.catchange.xcnt > 0)
{
Datum  *arrayelems;


Re: Add contrib/pg_logicalsnapinspect

2024-09-18 Thread Peter Smith
HI, here are some mostly minor review comments for the patch v5-0001.

==
Commit message

1.
Do you think you should also name the new functions here?

==
contrib/pg_logicalinspect/pg_logicalinspect.c

2.
Regarding the question about static function declarations:

Shveta wrote: I was somehow under the impression that this is the way
in the postgres i.e. not add redundant declarations.  Will be good to
know what others think on this.

FWIW, my understanding is the convention is just to be consistent with
whatever the module currently does. If it declares static functions,
then declare them all (redundant or not). If it doesn't declare static
functions, then don't add one. But, in the current case, since this is
a new module, I guess it is entirely up to you whatever you want to
do.

~~~

3.
+/*
+ * NOTE: For any code change or issue fix here, it is highly recommended to
+ * give a thought about doing the same in SnapBuildRestore() as well.
+ */
+

nit - I think this NOTE should be part of this module's header
comment. (e.g. like the tablesync.c NOTES)

~~~

ValidateSnapshotFile:

4.
+ValidateSnapshotFile(XLogRecPtr lsn, SnapBuildOnDisk *ondisk, const char *path)
+{
+ int fd;
+ Size sz;

nit - The 'sz' is overwritten a few times. I thnk declaring it at each
scope where used would be tidier.

~~~

5.
+ fsync_fname(path, false);
+ fsync_fname(PG_LOGICAL_SNAPSHOTS_DIR, true);
+
+

nit - remove some excessive blank lines

~~~

6.
+ /* read statically sized portion of snapshot */
+ SnapBuildRestoreContents(fd, (char *) ondisk,
SnapBuildOnDiskConstantSize, path);

Should that say "fixed size portion"?

~~~

pg_get_logical_snapshot_info:

7.
+ if (ondisk.builder.committed.xcnt > 0)
+ {
+ Datum*arrayelems;
+ int narrayelems;
+
+ arrayelems = (Datum *) palloc(ondisk.builder.committed.xcnt * sizeof(Datum));
+ narrayelems = 0;
+
+ for (narrayelems = 0; narrayelems < ondisk.builder.committed.xcnt;
narrayelems++)
+ arrayelems[narrayelems] = Int64GetDatum((int64)
ondisk.builder.committed.xip[narrayelems]);

nit - Why the double assignment of narrayelems = 0? It is simpler to
assign at the declaration and then remove both others.

~~~

8.
+ if (ondisk.builder.catchange.xcnt > 0)
+ {
+ Datum*arrayelems;
+ int narrayelems;
+
+ arrayelems = (Datum *) palloc(ondisk.builder.catchange.xcnt * sizeof(Datum));
+ narrayelems = 0;
+
+ for (narrayelems = 0; narrayelems < ondisk.builder.catchange.xcnt;
narrayelems++)
+ arrayelems[narrayelems] = Int64GetDatum((int64)
ondisk.builder.catchange.xip[narrayelems]);

nit - ditto previous comment

==
doc/src/sgml/pglogicalinspect.sgml

9.
+ 
+  The pg_logicalinspect module provides SQL functions
+  that allow you to inspect the contents of logical decoding components. It
+  allows to inspect serialized logical snapshots of a running
+  PostgreSQL database cluster, which is useful
+  for debugging or educational purposes.
+ 

nit - /It allows to inspect/It allows the inspection of/

~~~

10.
+  example:

nit - /example:/For example:/ (this is in a couple of places)

==
src/include/replication/snapbuild_internal.h

11.
+#ifndef INTERNAL_SNAPBUILD_H
+#define INTERNAL_SNAPBUILD_H

Shouldn't these be SNAPBUILD_INTERNAL_H to match the filename?

~~~

12.
The contents of the snapbuild.c that got moved into
snapbuild_internal.h also got shuffled around a bit.

e.g. originally the typedef struct SnapBuildOnDisk:

+/*
+ * We store current state of struct SnapBuild on disk in the following manner:
+ *
+ * struct SnapBuildOnDisk;
+ * TransactionId * committed.xcnt; (*not xcnt_space*)
+ * TransactionId * catchange.xcnt;
+ *
+ */
+typedef struct SnapBuildOnDisk

was directly beneath the comment:
-/* ---
- * Snapshot serialization support
- * ---
- */
-

The macros were also defined immediately after the SnapBuildOnDisk
fields they referred to.

Wasn't that original ordering better than how it is now ordered in
snapshot_internal.h?

==

Please also see the attachment, which implements some of those nits
mentioned above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/contrib/pg_logicalinspect/pg_logicalinspect.c 
b/contrib/pg_logicalinspect/pg_logicalinspect.c
index dc9041a..2111202 100644
--- a/contrib/pg_logicalinspect/pg_logicalinspect.c
+++ b/contrib/pg_logicalinspect/pg_logicalinspect.c
@@ -1,13 +1,17 @@
 /*-
  *
  * pg_logicalinspect.c
- *   Functions to inspect contents of PostgreSQL logical snapshots
+ * Functions to inspect contents of PostgreSQL logical snapshots
  *
  * Copyright (c) 2024, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
- *   contrib/pg_logicalinspect/pg_logicalinspect.c
+ * contrib/pg_logicalinspect/pg_logicalinspect.c
  *
+ *
+ * NOTES
+ * For any co

Re: Pgoutput not capturing the generated columns

2024-09-17 Thread Peter Smith
Hi, here are my review comments for patch v31-0002.

==

1. General.

IMO patches 0001 and 0002 should be merged when next posted. IIUC the
reason for the split was only because there were 2 different authors
but that seems to be not relevant anymore.

==
Commit message

2.
When 'copy_data' is true, during the initial sync, the data is replicated from
the publisher to the subscriber using the COPY command. The normal COPY
command does not copy generated columns, so when 'publish_generated_columns'
is true, we need to copy using the syntax:
'COPY (SELECT column_name FROM table_name) TO STDOUT'.

~

2a.
Should clarify that 'copy_data' is a SUBSCRIPTION parameter.

2b.
Should clarify that 'publish_generated_columns' is a PUBLICATION parameter.

==
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

3.
- for (i = 0; i < rel->remoterel.natts; i++)
+ desc = RelationGetDescr(rel->localrel);
+ localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));

Each time I review this code I am tricked into thinking it is wrong to
use rel->remoterel.natts here for the localgenlist. AFAICT the code is
actually fine because you do not store *all* the subscriber gencols in
'localgenlist' -- you only store those with matching names on the
publisher table. It might be good if you could add an explanatory
comment about that to prevent any future doubts.

~~~

4.
+ if (!remotegenlist[remote_attnum])
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication target relation \"%s.%s\" has a
generated column \"%s\" "
+ "but corresponding column on source relation is not a generated column",
+ rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname;

This error message has lots of good information. OTOH, I think when
copy_data=false the error would report the subscriber column just as
"missing", which is maybe less helpful. Perhaps that other
copy_data=false "missing" case can be improved to share the same error
message that you have here.

~~~

fetch_remote_table_info:

5.
IIUC, this logic needs to be more sophisticated to handle the case
that was being discussed earlier with Sawada-san [1]. e.g. when the
same table has gencols but there are multiple subscribed publications
where the 'publish_generated_columns' parameter differs.

Also, you'll need test cases for this scenario, because it is too
difficult to judge correctness just by visual inspection of the code.



6.
nit - Change 'hasgencolpub' to 'has_pub_with_pubgencols' for
readability, and initialize it to 'false' to make it easy to use
later.

~~~

7.
- * Get column lists for each relation.
+ * Get column lists for each relation and check if any of the publication
+ * has generated column option.

and

+ /* Check if any of the publication has generated column option */
+ if (server_version >= 18)

nit - tweak the comments to name the publication parameter properly.

~~~

8.
foreach(lc, MySubscription->publications)
{
if (foreach_current_index(lc) > 0)
appendStringInfoString(&pub_names, ", ");
appendStringInfoString(&pub_names, quote_literal_cstr(strVal(lfirst(lc;
}

I know this is existing code, but shouldn't all this be done by using
the purpose-built function 'get_publications_str'

~~~

9.
+ ereport(ERROR,
+ errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("could not fetch gencolumns information from publication list: %s",
+pub_names.data));

and

+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("failed to fetch tuple for gencols from publication list: %s",
+pub_names.data));

nit - /gencolumns information/generated column publication
information/ to make the errmsg more human-readable

~~~

10.
+ bool gencols_allowed = server_version >= 18 && hasgencolpub;
+
+ if (!gencols_allowed)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");

Can the 'gencols_allowed' var be removed, and the condition just be
replaced with if (!has_pub_with_pubgencols)? It seems equivalent
unless I am mistaken.

==

Please refer to the attachment which implements some of the nits
mentioned above.

==
[1] 
https://www.postgresql.org/message-id/CAD21AoBun9crSWaxteMqyu8A_zme2ppa2uJvLJSJC2E3DJxQVA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 723c44c..6d17984 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -850,7 +850,7 @@ fetch_remote_table_info(char *nspname, char *relname, bool 
**remotegenlist_res,
Oid qualRow[] = {TEXTOID};
boolisnull;
bool

Re: Pgoutput not capturing the generated columns

2024-09-17 Thread Peter Smith
Review comments for v31-0001.

(I tried to give only new comments, but there might be some overlap
with comments I previously made for v30-0001)

==
src/backend/catalog/pg_publication.c

1.
+
+ if (publish_generated_columns_given)
+ {
+ values[Anum_pg_publication_pubgencolumns - 1] =
BoolGetDatum(publish_generated_columns);
+ replaces[Anum_pg_publication_pubgencolumns - 1] = true;
+ }

nit - unnecessary whitespace above here.

==
src/backend/replication/pgoutput/pgoutput.c

2. prepare_all_columns_bms

+ /* Iterate the cols until generated columns are found. */
+ cols = bms_add_member(cols, i + 1);

How does the comment relate to the statement that follows it?

~~~

3.
+ * Skip generated column if pubgencolumns option was not
+ * specified.

nit - /pubgencolumns option/publish_generated_columns parameter/

==
src/bin/pg_dump/pg_dump.c

4.
getPublications:

nit - /i_pub_gencolumns/i_pubgencols/ (it's the same information but simpler)

==
src/bin/pg_dump/pg_dump.h

5.
+ bool pubgencolumns;
 } PublicationInfo;

nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)

==
vsrc/bin/psql/describe.c

6.
  bool has_pubviaroot;
+ bool has_pubgencol;

nit - /has_pubgencol/has_pubgencols/ (plural consistency)

==
src/include/catalog/pg_publication.h

7.
+ /* true if generated columns data should be published */
+ bool pubgencolumns;
 } FormData_pg_publication;

nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)

~~~

8.
+ bool pubgencolumns;
  PublicationActions pubactions;
 } Publication;

nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)

==
src/test/regress/sql/publication.sql

9.
+-- Test the publication with or without 'PUBLISH_GENERATED_COLUMNS' parameter
+SET client_min_messages = 'ERROR';
+CREATE PUBLICATION pub1 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=1);
+\dRp+ pub1
+
+CREATE PUBLICATION pub2 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=0);
+\dRp+ pub2

9a.
nit - Use lowercase for the parameters.

~

9b.
nit - Fix the comment to say what the test is actually doing:
"Test the publication 'publish_generated_columns' parameter enabled or disabled"

==
src/test/subscription/t/031_column_list.pl

10.
Later I think you should add another test here to cover the scenario
that I was discussing with Sawada-San -- e.g. when there are 2
publications for the same table subscribed by just 1 subscription but
having different values of the 'publish_generated_columns' for the
publications.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2e7804e..cca54bc 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -515,8 +515,8 @@ CREATE TABLE people (
 
  
   Generated columns may be skipped during logical replication according to 
the
-  CREATE PUBLICATION option
-  
+  CREATE PUBLICATION parameter
+  
   publish_generated_columns.
  
 
diff --git a/doc/src/sgml/ref/create_publication.sgml 
b/doc/src/sgml/ref/create_publication.sgml
index e133dc3..cd20bd4 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -223,7 +223,7 @@ CREATE PUBLICATION name
 

 
-   
+   
 publish_generated_columns 
(boolean)
 
  
@@ -231,14 +231,6 @@ CREATE PUBLICATION name
   associated with the publication should be replicated.
   The default is false.
  
- 
-  This option is only available for replicating generated column data 
from the publisher
-  to a regular, non-generated column in the subscriber.
- 
- 
- This parameter can only be set true if 
copy_data is
- set to false.
- 
 

 
diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index 272b6a1..7ebb851 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -999,7 +999,7 @@ GetPublication(Oid pubid)
pub->pubactions.pubdelete = pubform->pubdelete;
pub->pubactions.pubtruncate = pubform->pubtruncate;
pub->pubviaroot = pubform->pubviaroot;
-   pub->pubgencolumns = pubform->pubgencolumns;
+   pub->pubgencols = pubform->pubgencols;
 
ReleaseSysCache(tup);
 
@@ -1211,7 +1211,7 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
if (att->attisdropped)
continue;
 
-   if (att->attgenerated && !pub->pubgencolumns)
+   if (att->attgenerated && !pub->pubgencols)
continue;
 
attnums[nattnums++] = att->attnum;
diff --git a/src/backend/commands/public

Re: Pgoutput not capturing the generated columns

2024-09-17 Thread Peter Smith
Here are some review comments for v31-0001 (for the docs only)

There may be some overlap here with some comments already made for
v30-0001 which are not yet addressed in v31-0001.

==
Commit message

1.
When introducing the 'publish_generated_columns' parameter, you must
also say this is a PUBLICATION parameter.

~~~

2.
With this enhancement, users can now include the 'include_generated_columns'
option when querying logical replication slots using either the pgoutput
plugin or the test_decoding plugin. This option, when set to 'true' or '1',
instructs the replication system to include generated column information
and data in the replication stream.

~

The above is stale information because it still refers to the old name
'include_generated_columns', and to test_decoding which was already
removed in this patch.

==
doc/src/sgml/ddl.sgml

3.
+  Generated columns may be skipped during logical replication
according to the
+  CREATE PUBLICATION option
+  
+  publish_generated_columns.

3a.
nit - The linkend is based on the old name instead of the new name.

3b.
nit - Better to call this a parameter instead of an option because
that is what the CREATE PUBLICATION docs call it.

==
doc/src/sgml/protocol.sgml

4.
+
+ publish_generated_columns
+  
+   
+Boolean option to enable generated columns. This option controls
+whether generated columns should be included in the string
+representation of tuples during logical decoding in PostgreSQL.
+   
+  
+
+

Is this even needed anymore? Now that the implementation is using a
PUBLICATION parameter, isn't everything determined just by that
parameter? I don't see the reason why a protocol change is needed
anymore. And, if there is no protocol change needed, then this
documentation change is also not needed.



5.
  
-  Next, the following message part appears for each column included in
-  the publication (except generated columns):
+  Next, the following message parts appear for each column included in
+  the publication (generated columns are excluded unless the parameter
+  
+  publish_generated_columns specifies otherwise):
  

Like the previous comment above, I think everything is now determined
by the PUBLICATION parameter. So maybe this should just be referring
to that instead.

==
doc/src/sgml/ref/create_publication.sgml

6.
+   
+publish_generated_columns
(boolean)
+

nit - the ID is based on the old parameter name.

~

7.
+ 
+  This option is only available for replicating generated
column data from the publisher
+  to a regular, non-generated column in the subscriber.
+ 

IMO remove this paragraph. I really don't think you should be
mentioning the subscriber here at all. AFAIK this parameter is only
for determining if the generated column will be published or not. What
happens at the other end (e.g. logic whether it gets ignored or not by
the subscriber) is more like a matrix of behaviours that could be
documented in the "Logical Replication" section. But not here.

(I removed this in my nitpicks attachment)

~~~

8.
+ 
+ This parameter can only be set true if
copy_data is
+ set to false.
+ 

IMO remove this paragraph too. The user can create a PUBLICATION
before a SUBSCRIPTION even exists so to say it "can only be set..." is
not correct. Sure, your patch 0001 does not support the COPY of
generated columns but if you want to document that then it should be
documented in the CREATE SUBSCRIBER docs. But not here.

(I removed this in my nitpicks attachment)

TBH, it would be better if patches 0001 and 0002 were merged then you
can avoid all this. IIUC they were only separate in the first place
because 2 different people wrote them. It is not making reviews easier
with them split.

==

Please see the attachment which implements some of the nits above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2e7804e..cca54bc 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -515,8 +515,8 @@ CREATE TABLE people (
 
  
   Generated columns may be skipped during logical replication according to 
the
-  CREATE PUBLICATION option
-  
+  CREATE PUBLICATION parameter
+  
   publish_generated_columns.
  
 
diff --git a/doc/src/sgml/ref/create_publication.sgml 
b/doc/src/sgml/ref/create_publication.sgml
index e133dc3..cd20bd4 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -223,7 +223,7 @@ CREATE PUBLICATION name
 

 
-   
+   
 publish_generated_columns 
(boolean)
 
  
@@ -231,14 +231,6 @@ CREATE PUBLICATION name
   associated with the publication should 

Re: Pgoutput not capturing the generated columns

2024-09-16 Thread Peter Smith
On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada  wrote:
>
> On Mon, Sep 16, 2024 at 8:09 PM Peter Smith  wrote:
> >
> > I thought that the option "publish_generated_columns" is more related
> > to "column lists" than "row filters".
> >
> > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
> >
>
> > And
> > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > is equivalent to
> > PUBLICATION pub2 FOR TABLE t1(a,b,c);
>
> This makes sense to me as it preserves the current behavior.
>
> > Then:
> > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > is equivalent to
> > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);
>
> This also makes sense. It would also include future generated columns.
>
> > So, I would expect this to fail because the SUBSCRIPTION docs say
> > "Subscriptions having several publications in which the same table has
> > been published with different column lists are not supported."
>
> So I agree that it would raise an error if users subscribe to both
> pub1 and pub2.
>
> And looking back at your examples,
>
> > > > e.g.1
> > > > -
> > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = 
> > > > true);
> > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = 
> > > > false);
> > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > -
> > > >
> > > > e.g.2
> > > > -
> > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns 
> > > > = true);
> > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = 
> > > > false);
> > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > -
>
> Both examples would not be supported.
>
> > > >
> > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > > several publications in which the same table has been published with
> > > > different column lists are not supported."
> > > >
> > > > Perhaps the user is supposed to deduce that the example above would
> > > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > > generated cols then the PUBLICATION column lists must be different and
> > > > therefore it is "not supported" (??).
> > >
> > > With the patch, how should this feature work when users specify a
> > > generated column to the column list and set publish_generated_column =
> > > false, in the first place? raise an error (as we do today)? or always
> > > send NULL?
> >
> > For this scenario, I suggested (see [1] #3) that the code could give a
> > WARNING. As I wrote up-thread: This combination doesn't seem
> > like something a user would do intentionally, so just silently
> > ignoring it (which the current patch does) is likely going to give
> > someone unexpected results/grief.
>
> It gives a WARNING, and then publishes the specified generated column
> data (even if publish_generated_column = false)? If so, it would mean
> that specifying the generated column to the column list means to
> publish its data regardless of the publish_generated_column parameter
> value.
>

No. I meant only it can give the WARNING to tell the user user  "Hey,
there is a conflict here because you said publish_generated_column=
false, but you also specified gencols in the column list".

But always it is the option "publish_generated_column" determines the
final publishing behaviour. So if it says
publish_generated_column=false then it would NOT publish generated
columns even if they are gencols in the column list. I think this
makes sense because when there is no column list specified then that
implicitly means "all columns" and the table might have some gencols,
but still 'publish_generated_columns' is what determines the
behaviour.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-09-16 Thread Peter Smith
On Tue, Sep 17, 2024 at 7:02 AM Masahiko Sawada  wrote:
>
> On Wed, Sep 11, 2024 at 10:30 PM Peter Smith  wrote:
> >
> > Because this feature is now being implemented as a PUBLICATION option,
> > there is another scenario that might need consideration; I am thinking
> > about where the same table is published by multiple PUBLICATIONS (with
> > different option settings) that are subscribed by a single
> > SUBSCRIPTION.
> >
> > e.g.1
> > -
> > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = 
> > true);
> > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = 
> > false);
> > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > -
> >
> > e.g.2
> > -
> > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = 
> > true);
> > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = 
> > false);
> > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > -
> >
> > Do you know if this case is supported? If yes, then which publication
> > option value wins?
>
> I would expect these option values are processed with OR. That is, we
> publish changes of the generated columns if at least one publication
> sets publish_generated_columns to true. It seems to me that we treat
> multiple row filters in the same way.
>

I thought that the option "publish_generated_columns" is more related
to "column lists" than "row filters".

Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.

Then:
PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
is equivalent to
PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);

And
PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
is equivalent to
PUBLICATION pub2 FOR TABLE t1(a,b,c);

So, I would expect this to fail because the SUBSCRIPTION docs say
"Subscriptions having several publications in which the same table has
been published with different column lists are not supported."

~~

Here's another example:
PUBLICATION pub3 FOR TABLE t1(a,b);
PUBLICATION pub4 FOR TABLE t1(c);

Won't it be strange (e.g. difficult to explain) why pub1 and pub2
table column lists are allowed to be combined in one subscription, but
pub3 and pub4 in one subscription are not supported due to the
different column lists?

> >
> > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > several publications in which the same table has been published with
> > different column lists are not supported."
> >
> > Perhaps the user is supposed to deduce that the example above would
> > work OK if table 't1' has no generated cols. OTOH, if it did have
> > generated cols then the PUBLICATION column lists must be different and
> > therefore it is "not supported" (??).
>
> With the patch, how should this feature work when users specify a
> generated column to the column list and set publish_generated_column =
> false, in the first place? raise an error (as we do today)? or always
> send NULL?

For this scenario, I suggested (see [1] #3) that the code could give a
WARNING. As I wrote up-thread: This combination doesn't seem
like something a user would do intentionally, so just silently
ignoring it (which the current patch does) is likely going to give
someone unexpected results/grief.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com

Kind Regards,
Peter Smith
Fujitsu Australia




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-16 Thread Peter Smith
Here are a few comments for the patch v46-0001.

==
src/backend/replication/slot.c

1. ReportSlotInvalidation

On Mon, Sep 16, 2024 at 8:01 PM Bharath Rupireddy
 wrote:
>
> On Mon, Sep 9, 2024 at 1:11 PM Peter Smith  wrote:
> > 3. ReportSlotInvalidation
> >
> > I didn't understand why there was a hint for:
> > "You might need to increase \"%s\".", "max_slot_wal_keep_size"
> >
> > Why aren't these similar cases consistent?
>
> It looks misleading and not very useful. What happens if the removed
> WAL (that's needed for the slot) is put back into pg_wal somehow (by
> manually copying from archive or by some tool/script)? Can the slot
> invalidated due to wal_removed start sending WAL to its clients?
>
> > But you don't have an equivalent hint for timeout invalidation:
> > "You might need to increase \"%s\".", "replication_slot_inactive_timeout"
>
> I removed this per review comments upthread.

IIUC the errors are quite similar, so my previous review comment was
mostly about the unexpected inconsistency of why one of them has a
hint and the other one does not. I don't have a strong opinion about
whether they should both *have* or *not have* hints, so long as they
are treated the same.

If you think the current code hint is not useful then maybe we need a
new thread to address that existing issue. For example, maybe it
should be removed or reworded.

~~~

2. InvalidatePossiblyObsoleteSlot:

+ case RS_INVAL_INACTIVE_TIMEOUT:
+
+ if (!SlotInactiveTimeoutCheckAllowed(s))
+ break;
+
+ /*
+ * Check if the slot needs to be invalidated due to
+ * replication_slot_inactive_timeout GUC.
+ */
+ if (TimestampDifferenceExceeds(s->inactive_since, now,
+replication_slot_inactive_timeout * 1000))

nit - it might be tidier to avoid multiple breaks by just combining
these conditions. See the nitpick attachment.

~~~

3.
  * - RS_INVAL_WAL_LEVEL: is logical
+ * - RS_INVAL_INACTIVE_TIMEOUT: inactive timeout occurs

nit - use comment wording "inactive slot timeout has occurred", to
make it identical to the comment in slot.h

==
src/test/recovery/t/050_invalidate_slots.pl

4.
+# Despite inactive timeout being set, the synced slot won't get invalidated on
+# its own on the standby. So, we must not see invalidation message in server
+# log.
+$standby1->safe_psql('postgres', "CHECKPOINT");
+ok( !$standby1->log_contains(
+ "invalidating obsolete replication slot \"sync_slot1\"",
+ $logstart),
+ 'check that synced slot sync_slot1 has not been invalidated on standby'
+);
+

It seems kind of brittle to check the logs for something that is NOT
there because any change to the message will make this accidentally
pass. Apart from that, it might anyway be more efficient just to check
the pg_replication_slots again to make sure the 'invalidation_reason
remains' still NULL.

==

Please see the attachment which implements some of the nit changes
mentioned above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 851120e..0076e4b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1716,15 +1716,12 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
invalidation_cause = cause;
break;
case RS_INVAL_INACTIVE_TIMEOUT:
-
-   if (!SlotInactiveTimeoutCheckAllowed(s))
-   break;
-
/*
 * Check if the slot needs to be 
invalidated due to
 * replication_slot_inactive_timeout 
GUC.
 */
-   if 
(TimestampDifferenceExceeds(s->inactive_since, now,
+   if (SlotInactiveTimeoutCheckAllowed(s) 
&&
+   
TimestampDifferenceExceeds(s->inactive_since, now,

   replication_slot_inactive_timeout * 1000))
{
invalidation_cause = cause;
@@ -1894,7 +1891,7 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
  * - RS_INVAL_HORIZON: requires a snapshot <= the given horizon in the given
  *   db; dboid may be InvalidOid for shared relations
  * - RS_INVAL_WAL_LEVEL: is logical
- * - RS_INVAL_INACTIVE_TIMEOUT: inactive timeout occurs
+ * - RS_INVAL_INACTIVE_TIMEOUT: inactive slot time

Re: Pgoutput not capturing the generated columns

2024-09-16 Thread Peter Smith
gt; > >
> > > > > I think we can create a publication for a single table, so what we can
> > > > > do with this feature can be done also by the idea you described below.
> > > > >
> > > > > > Yet another idea is to keep this as a publication option
> > > > > > (include_generated_columns or publish_generated_columns) similar to
> > > > > > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > > > > > is used when tables on either side have different partitions
> > > > > > hierarchies which is somewhat the case here.
> > > > >
> > > > > It sounds more useful to me.
> > > > >
> > > >
> > > > Fair enough. Let's see if anyone else has any preference among the
> > > > proposed methods or can think of a better way.
> > >
> > > I have fixed the current issue. I have added the option
> > > 'publish_generated_columns' to the publisher side and created the new
> > > test cases accordingly.
> > > The attached patches contain the desired changes.
> > >
> >
> > Thank you for updating the patches. I have some comments:
> >
> > Do we really need to add this option to test_decoding? I think it
> > would be good if this improves the test coverage. Otherwise, I'm not
> > sure we need this part. If we want to add it, I think it would be
> > better to have it in a separate patch.
> >
>
> I have removed the option from the test_decoding file.
>
> > ---
> > + 
> > +  If the publisher-side column is also a generated column
> > then this option
> > +  has no effect; the publisher column will be filled as normal 
> > with the
> > +  publisher-side computed or default data.
> > + 
> >
> > I don't understand this description. Why does this option have no
> > effect if the publisher-side column is a generated column?
> >
>
> The documentation was incorrect. Currently, replicating from a
> publisher table with a generated column to a subscriber table with a
> generated column will result in an error. This has now been updated.
>
> > ---
> > + 
> > + This parameter can only be set true if
> > copy_data is
> > + set to false.
> > + 
> >
> > If I understand this patch correctly, it doesn't disallow to set
> > copy_data to true when the publish_generated_columns option is
> > specified. But do we want to disallow it? I think it would be more
> > useful and understandable if we allow to use both
> > publish_generated_columns (publisher option) and copy_data (subscriber
> > option) at the same time.
> >
>
> Support for tablesync with generated columns was not included in the
> initial patch, and this was reflected in the documentation. The
> functionality for syncing generated column data has been introduced
> with the 0002 patch.
>

Since nothing was said otherwise, I assumed my v30-0001 comments were
addressed in v31, but the new code seems to have quite a few of my
suggested changes missing. If you haven't addressed my review comments
for patch 0001 yet, please say so. OTOH, please give reasons for any
rejected comments.

> The attached v31 patches contain the changes for the same. I won't be
> posting the test patch for now. I will share it once this patch has
> been stabilized.

How can the patch become "stabilized" without associated tests to
verify the behaviour is not broken? e.g. I can write a stable function
that says 2+2=5.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-09-11 Thread Peter Smith
Because this feature is now being implemented as a PUBLICATION option,
there is another scenario that might need consideration; I am thinking
about where the same table is published by multiple PUBLICATIONS (with
different option settings) that are subscribed by a single
SUBSCRIPTION.

e.g.1
-
CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
-

e.g.2
-
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
-

Do you know if this case is supported? If yes, then which publication
option value wins?

The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
several publications in which the same table has been published with
different column lists are not supported."

Perhaps the user is supposed to deduce that the example above would
work OK if table 't1' has no generated cols. OTOH, if it did have
generated cols then the PUBLICATION column lists must be different and
therefore it is "not supported" (??).

I have not tried this to see what happens, but even if it behaves as
expected, there should probably be some comments/docs/tests for this
scenario to clarify it for the user.

Notice that "publish_via_partition_root" has a similar conundrum, but
in that case, the behaviour is documented in the CREATE PUBLICATION
docs [2]. So, maybe  "publish_generated_columns" should be documented
a bit like that.

==
[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html
[2] https://www.postgresql.org/docs/devel/sql-createpublication.html

Kind Regards,
Peter Smith.
Fujitsu Australia




Remove shadowed declaration warnings

2024-09-11 Thread Peter Smith
o’ shadows
a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:841:46: warning: declaration of ‘dbinfo’ shadows
a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:961:47: warning: declaration of ‘dbinfo’ shadows
a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1104:41: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1142:41: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1182:45: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1242:54: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1272:56: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1314:70: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1363:60: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1553:57: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1627:55: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1681:64: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1739:69: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1830:64: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]

==
Kind Regards,
Peter Smith.
Fujitsu Australia
controldata_utils.c:52:29: warning: declaration of ‘DataDir’ shadows a global 
declaration [-Wshadow]
../../src/include/miscadmin.h:172:26: warning: shadowed declaration is here 
[-Wshadow]
controldata_utils.c:189:32: warning: declaration of ‘DataDir’ shadows a global 
declaration [-Wshadow]
../../src/include/miscadmin.h:172:26: warning: shadowed declaration is here 
[-Wshadow]
brin.c:685:16: warning: declaration of ‘tmp’ shadows a previous local [-Wshadow]
brin.c:579:11: warning: shadowed declaration is here [-Wshadow]
gistbuild.c:1159:23: warning: declaration of ‘splitinfo’ shadows a previous 
local [-Wshadow]
gistbuild.c:1059:11: warning: shadowed declaration is here [-Wshadow]
xlogdesc.c:40:26: warning: declaration of ‘wal_level’ shadows a global 
declaration [-Wshadow]
../../../../src/include/access/xlog.h:96:24: warning: shadowed declaration is 
here [-Wshadow]
xlogdesc.c:165:9: warning: declaration of ‘wal_level’ shadows a global 
declaration [-Wshadow]
../../../../src/include/access/xlog.h:96:24: warning: shadowed declaration is 
here [-Wshadow]
xlogreader.c:106:24: warning: declaration of ‘wal_segment_size’ shadows a 
global declaration [-Wshadow]
../../../../src/include/access/xlog.h:37:24: warning: shadowed declaration is 
here [-Wshadow]
xlogrecovery.c:1210:13: warning: declaration of ‘backupEndRequired’ shadows a 
global declaration [-Wshadow]
xlogrecovery.c:284:13: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:1920:33: warning: declaration of ‘xlogreader’ shadows a global 
declaration [-Wshadow]
xlogrecovery.c:189:25: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:3144:28: warning: declaration of ‘xlogprefetcher’ shadows a 
global declaration [-Wshadow]
xlogrecovery.c:192:24: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:3148:19: warning: declaration of ‘xlogreader’ shadows a global 
declaration [-Wshadow]
xlogrecovery.c:189:25: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:3311:31: warning: declaration of ‘xlogreader’ shadows a global 
declaration [-Wshadow]
xlogrecovery.c:189:25: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:4062:38: warning: declaration of ‘xlogprefetcher’ shadows a 
global declaration [

Re: Pgoutput not capturing the generated columns

2024-09-11 Thread Peter Smith
Hi Shubham,

Here are my general comments about the v30-0002 TAP test patch.

==

1.
As mentioned in a previous post [1] there are still several references
to the old 'include_generated_columns' option remaining in this patch.
They need replacing.

~~~

2.
+# Furthermore, all combinations are tested for publish_generated_columns=false
+# (see subscription sub1 of database 'postgres'), and
+# publish_generated_columns=true (see subscription sub2 of database
+# 'test_igc_true').

Those 'see subscription' notes and 'test_igc_true' are from the old
implementation. Those need fixing. BTW, 'test_pgc_true' is a better
name for the database now that the option name is changed.

In the previous implementation, the TAP test environment was:
- a common publication pub, on the 'postgres' database
- a subscription sub1 with option include_generated_columns=false, on
the 'postgres' database
- a subscription sub2 with option include_generated_columns=true, on
the 'test_igc_true' database

Now it is like:
- a publication pub1, on the 'postgres' database, with option
publish_generated_columns=false
- a publication pub2, on the 'postgres' database, with option
publish_generated_columns=true
- a subscription sub1, on the 'postgres' database for publication pub1
- a subscription sub2, on the 'test_pgc_true' database for publication pub2

It would be good to document that above convention because knowing how
the naming/numbering works makes it a lot easier to read the
subsequent test cases. Of course, it is really important to
name/number everything consistently otherwise these tests become hard
to follow.  AFAICT it is mostly OK, but the generated -> generated
publication should be called 'regress_pub2_gen_to_gen'

~~~

3.
+# Create table.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a *
2) STORED);
+ INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3);
+));
+
+# Create publication with publish_generated_columns=false.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = false)"
+);
+
+# Create table and subscription with copy_data=true.
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_gen_to_nogen (a int, b int);
+ CREATE SUBSCRIPTION regress_sub1_gen_to_nogen CONNECTION
'$publisher_connstr' PUBLICATION regress_pub1_gen_to_nogen WITH
(copy_data = true);
+));
+
+# Create publication with publish_generated_columns=true.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = true)"
+);
+

The code can be restructured to be simpler. Both publications are
always created on the 'postgres' database at the publisher node, so
let's just create them at the same time as the creating the publisher
table. It also makes readability much better e.g.

# Create table, and publications
$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3);
CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = false);
CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = true);
));

IFAICT this same simplification can be repeated multiple times in this TAP file.

~~

Similarly, it would be neater to combine DROP PUBLICATION's together too.

~~~

4.
Hopefully, the generated column 'copy_data' can be implemented again
soon for subscriptions, and then the initial sync tests here can be
properly implemented instead of the placeholders currently in patch
0002.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuDJToG%3DV-ogTi9_6fnhhn2S0%2BsVRGPynhcf9mEh0Q%3DLA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-09-10 Thread Peter Smith
Here are a some more review comments for patch v30-0001.

==
src/sgml/ref/create_publication.sgml

1.
+ 
+  If the publisher-side column is also a generated column
then this option
+  has no effect; the publisher column will be filled as normal with the
+  publisher-side computed or default data.
+ 

It should say "subscriber-side"; not "publisher-side". The same was
already reported by Sawada-San [1].

~~~

2.
+ 
+ This parameter can only be set true if
copy_data is
+ set to false.
+ 

IMO this limitation should be addressed by patch 0001 like it was
already done in the previous patches (e.g. v22-0002). I think
Sawada-san suggested the same [1].

Anyway, 'copy_data' is not a PUBLICATION option, so the fact it is
mentioned like this without any reference to the SUBSCRIPTION seems
like a cut/paste error from the previous implementation.

==
src/backend/catalog/pg_publication.c

3. pub_collist_validate
- if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
- ereport(ERROR,
- errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("cannot use generated column \"%s\" in publication column list",
-colname));
-

Instead of just removing this ERROR entirely here, I thought it would
be more user-friendly to give a WARNING if the PUBLICATION's explicit
column list includes generated cols when the option
"publish_generated_columns" is false. This combination doesn't seem
like something a user would do intentionally, so just silently
ignoring it (like the current patch does) is likely going to give
someone unexpected results/grief.

==
src/backend/replication/logical/proto.c

4. logicalrep_write_tuple, and logicalrep_write_attrs:

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

Why aren't you also checking the new PUBLICATION option here and
skipping all gencols if the "publish_generated_columns" option is
false? Or is the BMS of pgoutput_column_list_init handling this case?
Maybe there should be an Assert for this?

==
src/backend/replication/pgoutput/pgoutput.c

5. send_relation_and_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

Same question as #4.

~~~

6. prepare_all_columns_bms and pgoutput_column_list_init

+ if (att->attgenerated && !pub->pubgencolumns)
+ cols = bms_del_member(cols, i + 1);

IIUC, the algorithm seems overly tricky filling the BMS with all
columns, before straight away conditionally removing the generated
columns. Can't it be refactored to assign all the correct columns
up-front, to avoid calling bms_del_member()?

==
src/bin/pg_dump/pg_dump.c

7. getPublications

IIUC, there is lots of missing SQL code here (for all older versions)
that should be saying "false AS pubgencolumns".
e.g. compare the SQL with how "false AS pubviaroot" is used.

==
src/bin/pg_dump/t/002_pg_dump.pl

8. Missing tests?

I expected to see a pg_dump test for this new PUBLICATION option.

==
src/test/regress/sql/publication.sql

9. Missing tests?

How about adding another test case that checks this new option must be
"Boolean"?

~~~

10. Missing tests?

--- error: generated column "d" can't be in list
+-- ok: generated columns can be in the list too
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
+ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;

(see my earlier comment #3)

IMO there should be another test case for a WARNING here if the user
attempts to include generated column 'd' in an explicit PUBLICATION
column list while the "publish_generated-columns" is false.

==
[1]  
https://www.postgresql.org/message-id/CAD21AoA-tdTz0G-vri8KM2TXeFU8RCDsOpBXUBCgwkfokF7%3DjA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Disallow altering invalidated replication slots

2024-09-10 Thread Peter Smith
On Wed, Sep 11, 2024 at 3:54 AM Bharath Rupireddy
 wrote:
>
> Hi,
>
> Thanks for reviewing.
>
> On Tue, Sep 10, 2024 at 8:40 AM Peter Smith  wrote:
> >
> > Commit message
> >
> > 1.
> > ALTER_REPLICATION_SLOT on invalidated replication slots is unnecessary
> > as there is no way...
> >
> > suggestion:
> > ALTER_REPLICATION_SLOT for invalid replication slots should not be
> > allowed because there is no way...
>
> Modified.
>
> > ==
> > 2. Missing docs update
> >
> > Should this docs page [1] be updated to say ALTER_REPLICATION_SLOT is
> > not allowed for invalid slots?
>
> Haven't noticed for other ERROR cases in the docs, e.g. slots being
> synced, temporary slots. Not sure if it's worth adding every ERROR
> case to the docs.
>

OK.

> > ==
> > src/backend/replication/slot.c
> >
> > 3.
> > + if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE)
> > + ereport(ERROR,
> > + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("cannot alter replication slot \"%s\"", name),
> > + errdetail("This replication slot was invalidated due to \"%s\".",
> > +   SlotInvalidationCauses[MyReplicationSlot->data.invalidated]));
> > +
> >
> > I thought including the reason "invalid" (e.g. "cannot alter invalid
> > replication slot \"%s\"") in the message might be better, but OTOH I
> > see the patch message is the same as an existing one. Maybe see what
> > others think.
>
> Changed.
>
> > ==
> > src/test/recovery/t/035_standby_logical_decoding.pl
> >
> > 3.
> > There is already a comment about this test:
> > ##
> > # Recovery conflict: Invalidate conflicting slots, including in-use slots
> > # Scenario 1: hot_standby_feedback off and vacuum FULL
> > #
> > # In passing, ensure that replication slot stats are not removed when the
> > # active slot is invalidated.
> > ##
> >
> > IMO we should update that "In passing..." sentence to something like:
> >
> > In passing, ensure that replication slot stats are not removed when
> > the active slot is invalidated, and check that an error occurs when
> > attempting to alter the invalid slot.
>
> Added. But, keeping it closer to the test case doesn't hurt.
>
> Please find the attached v2 patch also having Shveta's review comments
> addressed.
>

The v2 patch looks OK to me.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-09-10 Thread Peter Smith
IIUC, previously there was a subscriber side option
'include_generated_columns', but now since v30* there is a publisher
side option 'publish_generated_columns'.

Fair enough, but in the v30* patches I can still see remnants of the
old name 'include_generated_columns' all over the place:
- in the commit message
- in the code (struct field names, param names etc)
- in the comments
- in the docs

If the decision is to call the new PUBLICATION option
'publish_generated_columns', then can't we please use that one name
*everywhere* -- e.g. replace all cases where any old name is still
lurking?

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: GUC names in messages

2024-09-10 Thread Peter Smith
On Wed, Sep 4, 2024 at 3:54 PM Michael Paquier  wrote:
>
...
> 0001 and 0004 have been applied with these tweaks.  I am still not
> sure about the changes for DateStyle and IntervalStyle in 0002 and
> 0003.  Perhaps others have an opinion that could drive to a consensus.
>

Thanks for pushing the patches 0001 and 0004.

I have rebased the two remaining patches. See v12 attached.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v12-0002-GUC-names-fix-case-datestyle.patch
Description: Binary data


v12-0001-GUC-names-fix-case-intervalstyle.patch
Description: Binary data


Re: Disallow altering invalidated replication slots

2024-09-09 Thread Peter Smith
Hi, here are some review comments for patch v1.

==
Commit message

1.
ALTER_REPLICATION_SLOT on invalidated replication slots is unnecessary
as there is no way...

suggestion:
ALTER_REPLICATION_SLOT for invalid replication slots should not be
allowed because there is no way...

==
2. Missing docs update

Should this docs page [1] be updated to say ALTER_REPLICATION_SLOT is
not allowed for invalid slots?

==
src/backend/replication/slot.c

3.
+ if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter replication slot \"%s\"", name),
+ errdetail("This replication slot was invalidated due to \"%s\".",
+   SlotInvalidationCauses[MyReplicationSlot->data.invalidated]));
+

I thought including the reason "invalid" (e.g. "cannot alter invalid
replication slot \"%s\"") in the message might be better, but OTOH I
see the patch message is the same as an existing one. Maybe see what
others think.

==
src/test/recovery/t/035_standby_logical_decoding.pl

3.
There is already a comment about this test:
##
# Recovery conflict: Invalidate conflicting slots, including in-use slots
# Scenario 1: hot_standby_feedback off and vacuum FULL
#
# In passing, ensure that replication slot stats are not removed when the
# active slot is invalidated.
##

IMO we should update that "In passing..." sentence to something like:

In passing, ensure that replication slot stats are not removed when
the active slot is invalidated, and check that an error occurs when
attempting to alter the invalid slot.

==
[1] docs - https://www.postgresql.org/docs/devel/protocol-replication.html

Kind Regards,
Peter Smith.
Fujitsu Austalia




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-09 Thread Peter Smith
~~~

16.
+{
+ my ($node, $slot, $offset, $inactive_timeout) = @_;
+ my $name = $node->name;

The variable $name seems too vague. How about $node_name?

~~~

17.
+ # Wait for invalidation reason to be set
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = '$slot' AND
+ invalidation_reason = 'inactive_timeout';
+ ])
+   or die
+   "Timed out while waiting for invalidation reason of slot $slot to
be set on node $name";

17a.
nit - /# Wait for invalidation reason to be set/# Check that the
invalidation reason is 'inactive_timeout'/

IIUC, the 'trigger_slot_invalidation' function has already invalidated
the slot at this point, so we are not really "Waiting..."; we are
"Checking..." that the reason was correctly set.

~

17b.
I think this code fragment maybe would be better put inside the
'trigger_slot_invalidation' function. (I've done this in the nitpicks
attachment)

~~~

18.
+ # Check that invalidated slot cannot be acquired
+ my ($result, $stdout, $stderr);
+
+ ($result, $stdout, $stderr) = $node->psql(
+ 'postgres', qq[
+ SELECT pg_replication_slot_advance('$slot', '0/1');
+ ]);

18a.
s/Check that invalidated slot/Check that an invalidated slot/

~

18b.
nit - Remove some blank lines, because the comment applies to all below it.

==
sub trigger_slot_invalidation

19.
+# Trigger slot invalidation and confirm it in server log
+sub trigger_slot_invalidation

nit - s/confirm it in server log/confirm it in the server log/

~

20.
+{
+ my ($node, $slot, $offset, $inactive_timeout) = @_;
+ my $name = $node->name;
+ my $invalidated = 0;

(same as the other subroutine)
nit - The variable $name seems too vague. How about $node_name?

==

Please refer to the attached nitpicks top-up patch which implements
most of the above nits.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/recovery/t/050_invalidate_slots.pl 
b/src/test/recovery/t/050_invalidate_slots.pl
index 669d6cc..34b46d5 100644
--- a/src/test/recovery/t/050_invalidate_slots.pl
+++ b/src/test/recovery/t/050_invalidate_slots.pl
@@ -12,10 +12,10 @@ use Time::HiRes qw(usleep);
 # =
 # Testcase start
 #
-# Invalidate streaming standby slot and logical failover slot on primary due to
-# inactive timeout. Also, check logical failover slot synced to standby from
-# primary doesn't invalidate on its own, but gets the invalidated state from 
the
-# primary.
+# Invalidate a streaming standby slot and logical failover slot on the primary
+# due to inactive timeout. Also, check that a logical failover slot synced to
+# the standby from the primary doesn't invalidate on its own, but gets the
+# invalidated state from the primary.
 
 # Initialize primary
 my $primary = PostgreSQL::Test::Cluster->new('primary');
@@ -45,7 +45,7 @@ primary_slot_name = 'sb_slot1'
 primary_conninfo = '$connstr dbname=postgres'
 ));
 
-# Create sync slot on primary
+# Create sync slot on the primary
 $primary->psql('postgres',
q{SELECT pg_create_logical_replication_slot('sync_slot1', 
'test_decoding', false, false, true);}
 );
@@ -57,13 +57,13 @@ $primary->safe_psql(
 
 $standby1->start;
 
-# Wait until standby has replayed enough data
+# Wait until the standby has replayed enough data
 $primary->wait_for_catchup($standby1);
 
-# Sync primary slot to standby
+# Sync the primary slots to the standby
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
-# Confirm that logical failover slot is created on standby
+# Confirm that the logical failover slot is created on the standby
 is( $standby1->safe_psql(
'postgres',
q{SELECT count(*) = 1 FROM pg_replication_slots
@@ -73,24 +73,24 @@ is( $standby1->safe_psql(
'logical slot sync_slot1 has synced as true on standby');
 
 my $logstart = -s $primary->logfile;
-my $inactive_timeout = 1;
 
-# Set timeout so that next checkpoint will invalidate inactive slot
+# Set timeout GUC so that that next checkpoint will invalidate inactive slots
+my $inactive_timeout = 1;
 $primary->safe_psql(
'postgres', qq[
 ALTER SYSTEM SET replication_slot_inactive_timeout TO 
'${inactive_timeout}s';
 ]);
 $primary->reload;
 
-# Check for logical failover slot to become inactive on primary. Note that
+# Wait for logical failover slot to become inactive on the primary. Note that
 # nobody has acquired slot yet, so it must get invalidated due to
 # inactive timeout.
-check_for_slot_invalidation($primary, 'sync_slot1', $logstart,
+wait_for_slot_invalidation($primary, 'sync_slot1', $logstart,
$ina

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-09 Thread Peter Smith
Hi, here are some review comments for v45-0001 (excluding the test code)

==
doc/src/sgml/config.sgml

1.
+Note that the inactive timeout invalidation mechanism is not
+applicable for slots on the standby server that are being synced
+from primary server (i.e., standby slots having

nit - /from primary server/from the primary server/

==
src/backend/replication/slot.c

2. ReplicationSlotAcquire

+ errmsg("can no longer get changes from replication slot \"%s\"",
+ NameStr(s->data.name)),
+ errdetail("This slot has been invalidated because it was inactive
for longer than the amount of time specified by \"%s\".",
+"replication_slot_inactive_timeout.")));

nit - "replication_slot_inactive_timeout." - should be no period
inside that GUC name literal

~~~

3. ReportSlotInvalidation

I didn't understand why there was a hint for:
"You might need to increase \"%s\".", "max_slot_wal_keep_size"

But you don't have an equivalent hint for timeout invalidation:
"You might need to increase \"%s\".", "replication_slot_inactive_timeout"

Why aren't these similar cases consistent?

~~~

4. RestoreSlotFromDisk

+ /* Use the same inactive_since time for all the slots. */
+ if (now == 0)
+ now = GetCurrentTimestamp();
+

Is the deferred assignment really necessary? Why not just
unconditionally assign the 'now' just before the for-loop? Or even at
the declaration? e.g. The 'replication_slot_inactive_timeout' is
measured in seconds so I don't think 'inactive_since' being wrong by a
millisecond here will make any difference.

==
src/include/replication/slot.h

5. ReplicationSlotSetInactiveSince

+/*
+ * Set slot's inactive_since property unless it was previously invalidated due
+ * to inactive timeout.
+ */
+static inline void
+ReplicationSlotSetInactiveSince(ReplicationSlot *s, TimestampTz *now,
+ bool acquire_lock)
+{
+ if (acquire_lock)
+ SpinLockAcquire(&s->mutex);
+
+ if (s->data.invalidated != RS_INVAL_INACTIVE_TIMEOUT)
+ s->inactive_since = *now;
+
+ if (acquire_lock)
+ SpinLockRelease(&s->mutex);
+}

Is the logic correct? What if the slot was already invalid due to some
reason other than RS_INVAL_INACTIVE_TIMEOUT? Is an Assert needed?

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 27b2285..97b4fb5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4582,7 +4582,7 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows

 Note that the inactive timeout invalidation mechanism is not
 applicable for slots on the standby server that are being synced
-from primary server (i.e., standby slots having
+from the primary server (i.e., standby slots having
 pg_replication_slots.synced
 value true).
 Synced slots are always considered to be inactive because they don't
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index d92b92b..8cc67b4 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -640,7 +640,7 @@ retry:
 errmsg("can no longer get changes from 
replication slot \"%s\"",
NameStr(s->data.name)),
 errdetail("This slot has been invalidated 
because it was inactive for longer than the amount of time specified by 
\"%s\".",
-  
"replication_slot_inactive_timeout.")));
+  
"replication_slot_inactive_timeout")));
}
 
/*


Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-08 Thread Peter Smith
On Mon, Sep 9, 2024 at 12:20 PM David G. Johnston
 wrote:
>
>
>
> On Sun, Sep 8, 2024, 18:55 Peter Smith  wrote:
>>
>> Saying "The time..." is fine, but the suggestions given seem backwards to me:
>> - The time this slot was inactivated
>> - The time when the slot became inactive.
>> - The time when the slot was deactivated.
>>
>> e.g. It is not like light switch. So, there is no moment when the
>> active slot "became inactive" or "was deactivated".
>
>
> While this is plausible the existing wording and the name of the field 
> definitely fail to convey this.
>
>>
>> Rather, the 'inactive_since' timestamp field is simply:
>> - The time the slot was last active.
>> - The last time the slot was active.
>
>
> I see your point but that wording is also quite confusing when an active slot 
> returns null for this field.
>
> At this point I'm confused enough to need whatever wording is taken to be 
> supported by someone explaining the code that interacts with this field.
>

Me too. I created this thread primarily to get the description changed
to clarify this field represents a moment in time, rather than a
duration. So I will be happy with any wording that addresses that.

> I suppose I'm expecting something like: The time the last activity finished, 
> or null if an activity is in-progress.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-08 Thread Peter Smith
Saying "The time..." is fine, but the suggestions given seem backwards to me:
- The time this slot was inactivated
- The time when the slot became inactive.
- The time when the slot was deactivated.

e.g. It is not like light switch. So, there is no moment when the
active slot "became inactive" or "was deactivated".

Rather, the 'inactive_since' timestamp field is simply:
- The time the slot was last active.
- The last time the slot was active.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-03 Thread Peter Smith
On Tue, Sep 3, 2024 at 4:12 PM shveta malik  wrote:
>
...
> Shall we make the change in code-comment as well:
>
> typedef struct ReplicationSlot
> {
> ...
> /* The time since the slot has become inactive */
> TimestampTz inactive_since;
> }
>

Hi Shveta,

Yes, I think so. I hadn't bothered to include this in the v1 patch
only because the docs are user-facing, but this code comment isn't.
However, now that you've mentioned it, I made the same change here
also. Thanks. See patch v2.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v2-0001-fix-description-for-inactive_since.patch
Description: Binary data


Re: Collect statistics about conflicts in logical replication

2024-09-03 Thread Peter Smith
On Tue, Sep 3, 2024 at 9:23 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Tuesday, September 3, 2024 7:12 PM Amit Kapila  
> wrote:
> >
> > Testing the stats for all types of conflicts is not required for this patch
> > especially because they increase the timings by 3-4s. We can add tests for 
> > one
> > or two types of conflicts.
> >
...
>
> Thanks for the comments. I have addressed the comments and adjusted the tests.
> In the V6 patch, Only insert_exists and delete_missing are tested.
>
> I confirmed that it only increased the testing time by 1 second on my machine.
>
> Best Regards,
> Hou zj

It seems a pity to throw away perfectly good test cases just because
they increase how long the suite takes to run.

This seems like yet another example of where we could have made good
use of the 'PG_TEST_EXTRA' environment variable. I have been trying to
propose adding "subscription" support for this in another thread [1].
By using this variable to make some tests conditional, we could have
the best of both worlds. e.g.
- retain all tests, but
- by default, only run a subset of those tests (to keep default test
execution time low).

I hope that if the idea to use PG_TEST_EXTRA for "subscription" tests
gets accepted then later we can revisit this, and put all the removed
extra test cases back in again.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsgtnr5BgcqYwD3PSf-AsUtVDE_j799AaZeAjJvE6HGtA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: GUC names in messages

2024-09-03 Thread Peter Smith
On Tue, Sep 3, 2024 at 4:35 PM Michael Paquier  wrote:
>
> On Tue, Sep 03, 2024 at 12:00:19PM +1000, Peter Smith wrote:
> > Here is the rebased patch set v10*. Everything is the same as before
> > except now there are only 7 patches instead of 8. The previous v9-0001
> > ("bool") patch no longer exists because those changes are now already
> > present in HEAD.
> >
> > I hope these might be pushed soon to avoid further rebasing.
>
> 0001~0004 could just be merged, they're the same thing, for different
> GUC types.  The consensus mentioned in 17974ec25946 makes that clear.
>
> 0007 is a good thing for translators, indeed..  I'll see about doing
> something here, at least.
> --
> Michael

Hi Michael, thanks for your interest.

I have merged the patches 0001-0004 as suggested. Please see v11 attachments.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v11-0001-Add-quotes-for-GUCs.patch
Description: Binary data


v11-0003-GUC-names-fix-case-datestyle.patch
Description: Binary data


v11-0004-GUC-names-make-common-translatable-messages.patch
Description: Binary data


v11-0002-GUC-names-fix-case-intervalstyle.patch
Description: Binary data


Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-03 Thread Peter Smith
On Tue, Sep 3, 2024 at 4:35 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Tue, Sep 03, 2024 at 10:43:14AM +0530, Amit Kapila wrote:
> > On Mon, Sep 2, 2024 at 9:14 AM shveta malik  wrote:
> > >
> > > On Mon, Sep 2, 2024 at 5:47 AM Peter Smith  wrote:
> > > > 
> > > >
> > > > To summarize, the current description wrongly describes the field as a
> > > > time duration:
> > > > "The time since the slot has become inactive."
> > > >
> > > > I suggest replacing it with:
> > > > "The slot has been inactive since this time."
> > > >
> > >
> > > +1 for the change. If I had read the document without knowing about
> > > the patch, I too would have interpreted it as a duration.
> > >
> >
> > The suggested change looks good to me as well. I'll wait for a day or
> > two before pushing to see if anyone thinks otherwise.
>
> I'm not 100% convinced the current wording is confusing because:
>
> - the field type is described as a "timestamptz".
> - there is no duration unit in the wording (if we were to describe a duration,
> we would probably add an unit to it, like "The time (in s)...").
>

Hmm. I assure you it is confusing because in English "The time since"
implies duration, and that makes the sentence contrary to the
timestamptz field type.  Indeed, I cited the Chat-GPT's interpretation
above specifically so that people would not just take this as my
opinion.

> That said, if we want to highlight that this is not a duration, what about?
>
> "The time (not duration) since the slot has become inactive."
>

There is no need to "highlight" anything. To avoid confusion we only
need to say what we mean.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-02 Thread Peter Smith
idation here, because it is the other subroutine doing the
waiting for the invalidation message in the logs. Instead, here I
think you are just confirming the 'invalidation_reason' got set
correctly. The comment should say what it is really doing.

==
sub check_for_slot_invalidation_in_server_log

10.
+# Check for invalidation of slot in server log
+sub check_for_slot_invalidation_in_server_log
+{

I think the main function of this subroutine is the CHECKPOINT and the
waiting for the server log to say invalidation happened. It is doing a
loop of a) CHECKPOINT then b) inspecting the server log for the slot
invalidation, and c) waiting for a bit. Repeat 10 times.

A comment describing the logic for this subroutine would be helpful.

The most important side-effect of this function is the CHECKPOINT
because without that nothing will ever get invalidated due to
inactivity, but this key point is not obvious from the subroutine
name.

IMO it would be better to name this differently to reflect what it is
really doing:
e.g. "CHECKPOINT_and_wait_for_slot_invalidation_in_server_log"

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: GUC names in messages

2024-09-02 Thread Peter Smith
Hi.

The cfbot was reporting my patches needed to be rebased.

Here is the rebased patch set v10*. Everything is the same as before
except now there are only 7 patches instead of 8. The previous v9-0001
("bool") patch no longer exists because those changes are now already
present in HEAD.

I hope these might be pushed soon to avoid further rebasing.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v10-0001-Add-quotes-for-GUCs-int.patch
Description: Binary data


v10-0004-Add-quotes-for-GUCs-enum.patch
Description: Binary data


v10-0003-Add-quotes-for-GUCs-string.patch
Description: Binary data


v10-0002-Add-quotes-for-GUCs-real.patch
Description: Binary data


v10-0005-GUC-names-fix-case-intervalstyle.patch
Description: Binary data


v10-0006-GUC-names-fix-case-datestyle.patch
Description: Binary data


v10-0007-GUC-names-make-common-translatable-messages.patch
Description: Binary data


Re: Collect statistics about conflicts in logical replication

2024-09-02 Thread Peter Smith
On Mon, Sep 2, 2024 at 1:28 PM shveta malik  wrote:
>
> On Mon, Sep 2, 2024 at 4:20 AM Peter Smith  wrote:
> >
> > On Fri, Aug 30, 2024 at 4:24 PM shveta malik  wrote:
> > >
> > > On Fri, Aug 30, 2024 at 10:53 AM Peter Smith  
> > > wrote:
> > > >
> > ...
> > > > 2. Arrange all the counts into an intuitive/natural order
> > > >
> > > > There is an intuitive/natural ordering for these counts. For example,
> > > > the 'confl_*' count fields are in the order insert -> update ->
> > > > delete, which LGTM.
> > > >
> > > > Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not
> > > > in a good order.
> > > >
> > > > IMO it makes more sense if everything is ordered as:
> > > > 'sync_error_count', then 'apply_error_count', then all the 'confl_*'
> > > > counts.
> > > >
> > > > This comment applies to lots of places, e.g.:
> > > > - docs (doc/src/sgml/monitoring.sgml)
> > > > - function pg_stat_get_subscription_stats (pg_proc.dat)
> > > > - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
> > > > - TAP test SELECTs (test/subscription/t/026_stats.pl)
> > > >
> > > > As all those places are already impacted by this patch, I think it
> > > > would be good if (in passing) we (if possible) also swapped the
> > > > sync/apply counts so everything  is ordered intuitively top-to-bottom
> > > > or left-to-right.
> > >
> > > Not sure about this though. It does not seem to belong to the current 
> > > patch.
> > >
> >
> > Fair enough. But, besides being inappropriate to include in the
> > current patch, do you think the suggestion to reorder them made sense?
> > If it has some merit, then I will propose it again as a separate
> > thread.
> >
>
>  Yes, I think it makes sense. With respect to internal code, it might
> be still okay as is, but when it comes to pg_stat_subscription_stats,
> I think it is better if user finds it in below order:
>  subid | subname | sync_error_count | apply_error_count | confl_*
>
>  rather than the existing one:
>  subid | subname | apply_error_count | sync_error_count | confl_*
>

Hi Shveta, Thanks. FYI - I created a new thread for this here [1].

==
[1] 
https://www.postgresql.org/message-id/CAHut+PvbOw90wgGF4aV1HyYtX=6pjWc+pn8_fep7L=alxwx...@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




pg_stats_subscription_stats order of the '*_count' columns

2024-09-02 Thread Peter Smith
Hi,

While reviewing another thread I was looking at the view
'pg_stats_subscription_stats' view. In particular, I was looking at
the order of the "*_count" columns of that view.

IMO there is an intuitive/natural ordering for the logical replication
operations (LR) being counted. For example, LR "initial tablesync"
always comes before LR "apply".

I propose that the columns of the view should also be in this same
intuitive order: Specifically, "sync_error_count" should come before
"apply_error_count" (left-to-right in view, top-to-bottom in docs).

Currently, they are not arranged that way.

The view today has only 2 count columns in HEAD, so this proposal
seems trivial, but there is another patch [2] soon to be pushed, which
will add more conflict count columns. As the number of columns
increases IMO it becomes more important that each column is where you
would intuitively expect to find it.

~

Changes would be needed in several places:
- docs (doc/src/sgml/monitoring.sgml)
- function pg_stat_get_subscription_stats (pg_proc.dat)
- view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
- TAP test SELECTs (test/subscription/t/026_stats.pl)

~

Thoughts?

==
[1] docs - 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION-STATS
[2] stats for conflicts -
https://www.postgresql.org/message-id/flat/OS0PR01MB57160A07BD575773045FC214948F2%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-02 Thread Peter Smith
test/recovery/t/050_invalidate_slots.pl

~~~

Please refer to the attached file which implements some of the nits
mentioned above.

==
[1] v43 review -
https://www.postgresql.org/message-id/CAHut%2BPuFzCHPCiZbpoQX59kgZbebuWT0gR0O7rOe4t_sdYu%3DOA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 970b496..0537714 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4564,8 +4564,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows
   
   

-Invalidates replication slots that are inactive for longer than
-specified amount of time. If this value is specified without units,
+Invalidate replication slots that are inactive for longer than this
+amount of time. If this value is specified without units,
 it is taken as seconds. A value of zero (which is default) disables
 the timeout mechanism. This parameter can only be set in
 the postgresql.conf file or on the server
@@ -4573,11 +4573,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows

 

-This invalidation check happens either when the slot is acquired
-for use or during checkpoint. The time since the slot has become
-inactive is known from its
-inactive_since value using which the
-timeout is measured.
+Slot invalidation due to inactivity timeout occurs during checkpoint.
+The duration of slot inactivity is calculated using the slot's
+inactive_since field value.

 

@@ -4585,9 +4583,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows
 applicable for slots on the standby server that are being synced
 from primary server (i.e., standby slots having
 synced field true).
-Because such synced slots are typically considered not active
-(for them to be later considered as inactive) as they don't perform
-logical decoding to produce the changes.
+Synced slots are always considered to be inactive because they don't
+perform logical decoding to produce changes.

   
  
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index acc0370..bb06592 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -551,12 +551,11 @@ ReplicationSlotName(int index, Name name)
  * An error is raised if nowait is true and the slot is currently in use. If
  * nowait is false, we sleep until the slot is released by the owning process.
  *
- * An error is raised if check_for_invalidation is true and the slot has been
+ * An error is raised if error_if_invalid is true and the slot has been
  * invalidated previously.
  */
 void
-ReplicationSlotAcquire(const char *name, bool nowait,
-  bool check_for_invalidation)
+ReplicationSlotAcquire(const char *name, bool nowait,  bool error_if_invalid)
 {
ReplicationSlot *s;
int active_pid;
@@ -635,11 +634,10 @@ retry:
MyReplicationSlot = s;
 
/*
-* Error out if the slot has been invalidated previously. Because 
there's
-* no use in acquiring the invalidated slot.
+* An error is raised if error_if_invalid is true and the slot has been
+* invalidated previously.
 */
-   if (check_for_invalidation &&
-   s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT)
+   if (error_if_invalid && s->data.invalidated == 
RS_INVAL_INACTIVE_TIMEOUT)
{
Assert(s->inactive_since > 0);
ereport(ERROR,
@@ -1565,6 +1563,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause 
cause,
 _("You might 
need to increase \"%s\"."), "max_slot_wal_keep_size");
break;
}
+
case RS_INVAL_HORIZON:
appendStringInfo(&err_detail, _("The slot conflicted 
with xid horizon %u."),
 
snapshotConflictHorizon);
@@ -1573,6 +1572,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause 
cause,
case RS_INVAL_WAL_LEVEL:
appendStringInfoString(&err_detail, _("Logical decoding 
on standby requires \"wal_level\" >= \"logical\" on the primary server."));
break;
+
case RS_INVAL_INACTIVE_TIMEOUT:
Assert(inactive_since > 0);
appendStringInfo(&err_detail,
@@ -1584,6 +15

DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-01 Thread Peter Smith
Hi hackers. While reviewing another thread I had cause to look at the
docs for the pg_replication_slot 'inactive_since' field [1] for the
first time.

I was confused by the description, which is as follows:

inactive_since timestamptz
The time since the slot has become inactive. NULL if the slot is
currently being used.


Then I found the github history for the patch [2], and the
accompanying long thread discussion [3] about the renaming of that
field. I have no intention to re-open that can-of-worms, but OTOH I
feel the first sentence of the field description is wrong and needs
fixing.

Specifically, IMO describing something as "The time since..." means
some amount of elapsed time since some occurrence, but that is not the
correct description for this timestamp field.

This is not just a case of me being pedantic. For example, here is
what Chat-GPT had to say:

I asked:
What does "The time since the slot has become inactive." mean?

ChatGPT said:
"The time since the slot has become inactive" refers to the duration
that has passed from the moment a specific slot (likely a database
replication slot or a similar entity) stopped being active. In other
words, it measures how much time has elapsed since the slot
transitioned from an active state to an inactive state.

For example, if a slot became inactive 2 hours ago, "the time since
the slot has become inactive" would be 2 hours.


To summarize, the current description wrongly describes the field as a
time duration:
"The time since the slot has become inactive."

I suggest replacing it with:
"The slot has been inactive since this time."

The attached patch makes this suggested change.

==
[1] docs - https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
[2] thread - 
https://www.postgresql.org/message-id/ca+tgmob_ta-t2ty8qrkhbgnnlrf4zycwhghgfsuuofraedw...@mail.gmail.com
[3] push - 
https://github.com/postgres/postgres/commit/6d49c8d4b4f4a20eb5b4c501d78cf894fa13c0ea

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-fix-description-for-inactive_since.patch
Description: Binary data


Re: Collect statistics about conflicts in logical replication

2024-09-01 Thread Peter Smith
On Fri, Aug 30, 2024 at 4:24 PM shveta malik  wrote:
>
> On Fri, Aug 30, 2024 at 10:53 AM Peter Smith  wrote:
> >
...
> > 2. Arrange all the counts into an intuitive/natural order
> >
> > There is an intuitive/natural ordering for these counts. For example,
> > the 'confl_*' count fields are in the order insert -> update ->
> > delete, which LGTM.
> >
> > Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not
> > in a good order.
> >
> > IMO it makes more sense if everything is ordered as:
> > 'sync_error_count', then 'apply_error_count', then all the 'confl_*'
> > counts.
> >
> > This comment applies to lots of places, e.g.:
> > - docs (doc/src/sgml/monitoring.sgml)
> > - function pg_stat_get_subscription_stats (pg_proc.dat)
> > - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
> > - TAP test SELECTs (test/subscription/t/026_stats.pl)
> >
> > As all those places are already impacted by this patch, I think it
> > would be good if (in passing) we (if possible) also swapped the
> > sync/apply counts so everything  is ordered intuitively top-to-bottom
> > or left-to-right.
>
> Not sure about this though. It does not seem to belong to the current patch.
>

Fair enough. But, besides being inappropriate to include in the
current patch, do you think the suggestion to reorder them made sense?
If it has some merit, then I will propose it again as a separate
thread.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Collect statistics about conflicts in logical replication

2024-09-01 Thread Peter Smith
On Fri, Aug 30, 2024 at 6:36 PM shveta malik  wrote:
>
> On Fri, Aug 30, 2024 at 12:15 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> >
> > Here is V5 patch which addressed above and Shveta's[1] comments.
> >
>
> The patch looks good to me.
>

Patch v5 LGTM.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Collect statistics about conflicts in logical replication

2024-08-29 Thread Peter Smith
Hi Hou-San. Here are my review comments for v4-0001.

==

1. Add links in the docs

IMO it would be good for all these confl_* descriptions (in
doc/src/sgml/monitoring.sgml) to include links back to where each of
those conflict types was defined [1].

Indeed, when links are included to the original conflict type
information then I think you should remove mentioning
"track_commit_timestamp":
+   counted only when the
+   track_commit_timestamp
+   option is enabled on the subscriber.

It should be obvious that you cannot count a conflict if the conflict
does not happen, but I don't think we should scatter/duplicate those
rules in different places saying when certain conflicts can/can't
happen -- we should just link everywhere back to the original
description for those rules.

~~~

2. Arrange all the counts into an intuitive/natural order

There is an intuitive/natural ordering for these counts. For example,
the 'confl_*' count fields are in the order insert -> update ->
delete, which LGTM.

Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not
in a good order.

IMO it makes more sense if everything is ordered as:
'sync_error_count', then 'apply_error_count', then all the 'confl_*'
counts.

This comment applies to lots of places, e.g.:
- docs (doc/src/sgml/monitoring.sgml)
- function pg_stat_get_subscription_stats (pg_proc.dat)
- view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
- TAP test SELECTs (test/subscription/t/026_stats.pl)

As all those places are already impacted by this patch, I think it
would be good if (in passing) we (if possible) also swapped the
sync/apply counts so everything  is ordered intuitively top-to-bottom
or left-to-right.

==
[1] 
https://www.postgresql.org/docs/devel/logical-replication-conflicts.html#LOGICAL-REPLICATION-CONFLICTS

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-08-29 Thread Peter Smith
sign
now = GetCurrentTimestamp(); here?

~

11.
+ * Note that we don't invalidate synced slots because,
+ * they are typically considered not active as they don't
+ * perform logical decoding to produce the changes.

nit - tweaked punctuation

~

12.
+ * If the slot can be acquired, do so or if the slot is already ours,
+ * then mark it invalidated.  Otherwise we'll signal the owning
+ * process, below, and retry.

nit - tidied this comment. Suggestion:
If the slot can be acquired, do so and mark it as invalidated. If the
slot is already ours, mark it as invalidated. Otherwise, we'll signal
the owning process below and retry.

~

13.
+ if (active_pid == 0 ||
+ (MyReplicationSlot != NULL &&
+ MyReplicationSlot == s &&
+ active_pid == MyProcPid))

You are already checking MyReplicationSlot == s here, so that extra
check for MyReplicationSlot != NULL is redundant, isn't it?

~~~

14. CheckPointReplicationSlots

 /*
- * Flush all replication slots to disk.
+ * Flush all replication slots to disk. Also, invalidate slots during
+ * non-shutdown checkpoint.
  *
  * It is convenient to flush dirty replication slots at the time of checkpoint.
  * Additionally, in case of a shutdown checkpoint, we also identify the slots

nit - /Also, invalidate slots/Also, invalidate obsolete slots/

==
src/backend/utils/misc/guc_tables.c

15.
+ {"replication_slot_inactive_timeout", PGC_SIGHUP, REPLICATION_SENDING,
+ gettext_noop("Sets the amount of time to wait before invalidating an "
+ "inactive replication slot."),

nit - that is maybe a bit misleading because IIUC there is no real
"waiting" happening anywhere. Suggest:
Sets the amount of time a replication slot can remain inactive before
it will be invalidated.

==

Please take a look at the attached top-up patches. These include
changes for many of the nits above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7009350..c96ae53 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -671,9 +671,10 @@ retry:

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("can no longer get changes from 
replication slot \"%s\"",
NameStr(s->data.name)),
-errdetail("This slot has been 
invalidated because it was inactive since %s for more than %d seconds specified 
by \"replication_slot_inactive_timeout\".",
+errdetail("The slot became invalid 
because it was inactive since %s, which is more than %d seconds ago.",
   
timestamptz_to_str(s->inactive_since),
-  
replication_slot_inactive_timeout)));
+  
replication_slot_inactive_timeout),
+errhint("You might need to increase 
\"%s\".", "replication_slot_inactive_timeout")));
}
}
 
@@ -1738,9 +1739,9 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 * is disabled or slot is currently 
being used or the slot
 * on standby is currently being synced 
from the primary.
 *
-* Note that we don't invalidate synced 
slots because,
-* they are typically considered not 
active as they don't
-* perform logical decoding to produce 
the changes.
+* Note that we don't invalidate synced 
slots because
+* they are typically considered not 
active, as they don't
+* perform logical decoding to produce 
changes.
 */
if (replication_slot_inactive_timeout 
== 0 ||
s->inactive_since == 0 ||
@@ -1789,9 +1790,9 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
active_pid = s->active_pid;
 
/*
-* If the slot can be acquired, do so or if the slot is already 
ours,
-* then mark it invalidated.  Otherwise we'll signal the owning
-* process, below, and retry.
+* If the slot can be acquired, do so and mark it as 
invalidated.
+* If the slot is already ours, mark it as inv

Re: Collect statistics about conflicts in logical replication

2024-08-28 Thread Peter Smith
Hi Hou-San.

I tried an experiment where I deliberately violated a primary key
during initial table synchronization.

For example:

test_sub=# create table t1(a int primary key);
CREATE TABLE

test_sub=# insert into t1 values(1);
INSERT 0 1

test_sub=# create subscription sub1 connection 'dbname=test_pub'
publication pub1 with (enabled=false);
2024-08-29 09:53:21.172 AEST [24186] WARNING:  subscriptions created
by regression test cases should have names starting with "regress_"
WARNING:  subscriptions created by regression test cases should have
names starting with "regress_"
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION

test_sub=# select * from pg_stat_subscription_stats;
 subid | subname | apply_error_count | sync_error_count |
insert_exists_count | update_differ_count | update_exists_count |
update_missing_count | de
lete_differ_count | delete_missing_count | stats_reset
---+-+---+--+-+-+-+--+---
--+--+-
 16390 | sub1| 0 |0 |
 0 |   0 |   0 |
 0 |
0 |0 |
(1 row)

test_sub=# alter subscription sub1 enable;
ALTER SUBSCRIPTION

test_sub=# 2024-08-29 09:53:57.245 AEST [4345] LOG:  logical
replication apply worker for subscription "sub1" has started
2024-08-29 09:53:57.258 AEST [4347] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
2024-08-29 09:53:57.311 AEST [4347] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2024-08-29 09:53:57.311 AEST [4347] DETAIL:  Key (a)=(1) already exists.
2024-08-29 09:53:57.311 AEST [4347] CONTEXT:  COPY t1, line 1
2024-08-29 09:53:57.312 AEST [23501] LOG:  background worker "logical
replication tablesync worker" (PID 4347) exited with exit code 1
2024-08-29 09:54:02.385 AEST [4501] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
2024-08-29 09:54:02.462 AEST [4501] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2024-08-29 09:54:02.462 AEST [4501] DETAIL:  Key (a)=(1) already exists.
2024-08-29 09:54:02.462 AEST [4501] CONTEXT:  COPY t1, line 1
2024-08-29 09:54:02.463 AEST [23501] LOG:  background worker "logical
replication tablesync worker" (PID 4501) exited with exit code 1
2024-08-29 09:54:07.512 AEST [4654] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
2024-08-29 09:54:07.580 AEST [4654] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2024-08-29 09:54:07.580 AEST [4654] DETAIL:  Key (a)=(1) already exists.
2024-08-29 09:54:07.580 AEST [4654] CONTEXT:  COPY t1, line 1
...

test_sub=# alter subscription sub1 disable;'
ALTER SUBSCRIPTION
2024-08-29 09:55:10.329 AEST [4345] LOG:  logical replication worker
for subscription "sub1" will stop because the subscription was
disabled

test_sub=# select * from pg_stat_subscription_stats;
 subid | subname | apply_error_count | sync_error_count |
insert_exists_count | update_differ_count | update_exists_count |
update_missing_count | de
lete_differ_count | delete_missing_count | stats_reset
---+-+---+--+-+-+-+--+---
--+--+-
 16390 | sub1| 0 |   15 |
 0 |   0 |   0 |
 0 |
0 |0 |
(1 row)

~~~

Notice how after a while there were multiple (15) 'sync_error_count' recorded.

According to the docs: 'insert_exists' happens when "Inserting a row
that violates a NOT DEFERRABLE unique constraint.".  So why are there
not the same number of 'insert_exists_count' recorded in
pg_stat_subscription_stats?

The 'insert_exists' is either not happening or is not being counted
during table synchronization. Either way, it was not what I was
expecting. If it is not a bug, maybe the docs need to explain more
about the rules for 'insert_exists' during the initial table sync.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Collect statistics about conflicts in logical replication

2024-08-28 Thread Peter Smith
On Wed, Aug 28, 2024 at 9:19 PM Amit Kapila  wrote:
>
> On Wed, Aug 28, 2024 at 11:43 AM shveta malik  wrote:
> >
> > Thanks for the patch. Just thinking out loud, since we have names like
> > 'apply_error_count', 'sync_error_count' which tells that they are
> > actually error-count, will it be better to have something similar in
> > conflict-count cases, like 'insert_exists_conflict_count',
> > 'delete_missing_conflict_count' and so on. Thoughts?
> >
>
> It would be better to have conflict in the names but OTOH it will make
> the names a bit longer. The other alternatives could be (a)
> insert_exists_confl_count, etc. (b) confl_insert_exists_count, etc.
> (c) confl_insert_exists, etc. These are based on the column names in
> the existing view pg_stat_database_conflicts [1]. The (c) looks better
> than other options but it will make the conflict-related columns
> different from error-related columns.
>
> Yet another option is to have a different view like
> pg_stat_subscription_conflicts but that sounds like going too far.
>
> [1] - 
> https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-CONFLICTS-VIEW

Option (c) looked good to me.

Removing the suffix "_count" is OK. For example, try searching all of
Chapter 27 ("The Cumulative Statistics System") [1] for columns
described as "Number of ..." and you will find that a "_count" suffix
is used only rarely.

Adding the prefix "confl_" is OK. As mentioned, there is a precedent
for this. See "pg_stat_database_conflicts" [2].

Mixing column names where some have and some do not have "_count"
suffixes may not be ideal, but I see no problem because there are
precedents for that too. E.g. see "pg_stat_replication_slots" [3], and
"pg_stat_all_tables" [4].

==
[1] https://www.postgresql.org/docs/devel/monitoring-stats.html
[2] 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-CONFLICTS-VIEW
[3] 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW
[4] 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Conflict detection and logging in logical replication

2024-08-27 Thread Peter Smith
On Wed, Aug 28, 2024 at 3:53 PM shveta malik  wrote:
>
> On Wed, Aug 28, 2024 at 9:44 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > > > +1 on 'update_origin_differs' instead of 'update_origins_differ' as
> > > > the former is somewhat similar to other conflict names 'insert_exists'
> > > > and 'update_exists'.
> > >
> > > Since we reached a consensus on this, I am attaching a small patch to 
> > > rename
> > > as suggested.
> >
> > Sorry, I attached the wrong patch. Here is correct one.
> >
>
> LGTM.
>

LGTM.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Collect statistics about conflicts in logical replication

2024-08-26 Thread Peter Smith
On Mon, Aug 26, 2024 at 10:13 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Monday, August 26, 2024 3:30 PM Peter Smith  wrote:
> >
> > ==
> > src/include/replication/conflict.h
> >
> > nit - defined 'NUM_CONFLICT_TYPES' inside the enum (I think this way is
> > often used in other PG source enums)
>
> I think we have recently tended to avoid doing that, as it has been commented
> that this style is somewhat deceptive and can cause confusion. See a previous
> similar comment[1]. The current style follows the other existing examples 
> like:
>
> #define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1)
> #define IOCONTEXT_NUM_TYPES (IOCONTEXT_VACUUM + 1)
> #define IOOP_NUM_TYPES (IOOP_WRITEBACK + 1)
> #define BACKEND_NUM_TYPES (B_LOGGER + 1)

OK.

>
>
> > ==
> > src/test/subscription/t/026_stats.pl
> >
> > 1.
> > + # Delete data from the test table on the publisher. This delete
> > + operation # should be skipped on the subscriber since the table is already
> > empty.
> > + $node_publisher->safe_psql($db, qq(DELETE FROM $table_name;));
> > +
> > + # Wait for the subscriber to report tuple missing conflict.
> > + $node_subscriber->poll_query_until(
> > + $db,
> > + qq[
> > + SELECT update_missing_count > 0 AND delete_missing_count > 0 FROM
> > + pg_stat_subscription_stats WHERE subname = '$sub_name'
> > + ])
> > +   or die
> > +   qq(Timed out while waiting for tuple missing conflict for
> > subscription '$sub_name');
> >
> > Can you write a comment to explain why the replicated DELETE is
> > expected to increment both the 'update_missing_count' and the
> > 'delete_missing_count'?
>
> I think the comments several lines above the wait explained the reason[2]. I
> slightly modified the comments to make it clear.
>

1.
Right, but it still was not obvious to me what caused the
'update_missing_count'. On further study, I see it was a hangover from
the earlier UPDATE test case which was still stuck in an ERROR loop
attempting to do the update operation. e.g. before it was giving the
expected 'update_exists' conflicts but after the subscriber table
TRUNCATE the update conflict changes to give a 'update_missing'
conflict instead. I've updated the comment to reflect my
understanding. Please have a look to see if you agree.



2.
I separated the tests for 'update_missing' and 'delete_missing',
putting the update_missing test *before* the DELETE. I felt the
expected results were much clearer when each test did just one thing.
Please have a look to see if you agree.

~~~

3.
+# Enable track_commit_timestamp to detect origin-differ conflicts in logical
+# replication. Reduce wal_retrieve_retry_interval to 1ms to accelerate the
+# restart of the logical replication worker after encountering a conflict.
+$node_subscriber->append_conf(
+ 'postgresql.conf', q{
+track_commit_timestamp = on
+wal_retrieve_retry_interval = 1ms
+});

Later, after CDR resolvers are implemented, it might be good to
revisit these conflict test cases and re-write them to use some
conflict resolvers like 'skip'. Then the subscriber won't give ERRORs
and restart apply workers all the time behind the scenes, so you won't
need the above configuration for accelerating the worker restarts. In
other words, running these tests might be more efficient if you can
avoid restarting workers all the time.

I suggest putting an XXX comment here as a reminder that these tests
should be revisited to make use of conflict resolvers in the future.

~~~

nit - not caused by this patch, but other comment inconsistencies
about "stats_reset timestamp" can be fixed in passing too.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/subscription/t/026_stats.pl 
b/src/test/subscription/t/026_stats.pl
index 0df31a6..d9589f0 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -134,24 +134,37 @@ sub create_sub_pub_w_errors
  or die
  qq(Timed out while waiting for update_exists conflict for 
subscription '$sub_name');
 
-   # Truncate the test table to ensure that the conflicting update 
operations
-   # are skipped, allowing the test to continue.
-   $node_subscriber->safe_psql($db, qq(TRUNCATE $table_name));
+# Truncate the subscriber side test table. Now that the table is empty,
+# the update conflict ('update_existing') ERRORs will stop happening. A
+# single update conflict 'update_missing' will be reported, but the update
+# will be skipped on the subscriber, allowing the test to continue.
+$node_subscriber->safe_psql($db, qq(TRUNCATE $table

Re: Conflict detection and logging in logical replication

2024-08-26 Thread Peter Smith
On Mon, Aug 26, 2024 at 7:52 PM Amit Kapila  wrote:
>
> On Thu, Aug 22, 2024 at 2:21 PM Amit Kapila  wrote:
> >
> > On Thu, Aug 22, 2024 at 1:33 PM Peter Smith  wrote:
> > >
> > > Do you think the documentation for the 'column_value' parameter of the
> > > conflict logging should say that the displayed value might be
> > > truncated?
> > >
> >
> > I updated the patch to mention this and pushed it.
> >
>
> Peter Smith mentioned to me off-list that the names of conflict types
> 'update_differ' and 'delete_differ' are not intuitive as compared to
> all other conflict types like insert_exists, update_missing, etc. The
> other alternative that comes to mind for those conflicts is to name
> them as 'update_origin_differ'/''delete_origin_differ'.
>

For things to "differ" there must be more than one them. The plural of
origin is origins.

e.g. 'update_origins_differ'/''delete_origins_differ'.

OTOH, you could say "differs" instead of differ:

e.g. 'update_origin_differs'/''delete_origin_differs'.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Collect statistics about conflicts in logical replication

2024-08-26 Thread Peter Smith
Hi Hou-san. Here are some review comments for your patch v1-0001.

==
doc/src/sgml/logical-replication.sgml

nit - added a comma.

==
doc/src/sgml/monitoring.sgml

nit - use  for 'apply_error_count'.
nit - added a period when there are multiple sentences.
nit - adjusted field descriptions using Chat-GPT clarification suggestions

==
src/include/pgstat.h

nit - change the param to 'type' -- ie. same as the implementation calls it

==
src/include/replication/conflict.h

nit - defined 'NUM_CONFLICT_TYPES' inside the enum (I think this way
is often used in other PG source enums)

==
src/test/subscription/t/026_stats.pl

1.
+ # Delete data from the test table on the publisher. This delete operation
+ # should be skipped on the subscriber since the table is already empty.
+ $node_publisher->safe_psql($db, qq(DELETE FROM $table_name;));
+
+ # Wait for the subscriber to report tuple missing conflict.
+ $node_subscriber->poll_query_until(
+ $db,
+ qq[
+ SELECT update_missing_count > 0 AND delete_missing_count > 0
+ FROM pg_stat_subscription_stats
+ WHERE subname = '$sub_name'
+ ])
+   or die
+   qq(Timed out while waiting for tuple missing conflict for
subscription '$sub_name');

Can you write a comment to explain why the replicated DELETE is
expected to increment both the 'update_missing_count' and the
'delete_missing_count'?

~
nit - update several "Apply and Sync errors..." comments that did not
mention conflicts
nit - tweak comments wording for update_differ and delete_differ
nit - /both > 0/> 0/
nit - /both 0/0/

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/logical-replication.sgml 
b/doc/src/sgml/logical-replication.sgml
index f3e3641..f682369 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1585,7 +1585,7 @@ test_sub=# SELECT * FROM t1 ORDER BY id;
   
 
   
-   Additional logging is triggered and the conflict statistics are collected 
(displayed in the
+   Additional logging is triggered, and the conflict statistics are collected 
(displayed in the
pg_stat_subscription_stats
 view)
in the following conflict cases:

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ea36d46..ac3c773 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2159,7 +2159,7 @@ description | Waiting for a newly initialized WAL file to 
reach durable storage
   
Number of times an error occurred while applying changes. Note that any
conflict resulting in an apply error will be counted in both
-   apply_error_count and the corresponding conflict count.
+   apply_error_count and the corresponding conflict 
count.
   
  
 
@@ -2179,8 +2179,8 @@ description | Waiting for a newly initialized WAL file to 
reach durable storage
   
   
Number of times a row insertion violated a
-   NOT DEFERRABLE unique constraint while applying
-   changes
+   NOT DEFERRABLE unique constraint during the
+   application of changes
   
  
 
@@ -2189,11 +2189,11 @@ description | Waiting for a newly initialized WAL file 
to reach durable storage
update_differ_count bigint
   
   
-   Number of times an update was performed on a row that was previously
-   modified by another source while applying changes. This conflict is
+   Number of times an update was applied to a row that had been previously
+   modified by another source during the application of changes. This 
conflict is
counted only when the
track_commit_timestamp
-   option is enabled on the subscriber
+   option is enabled on the subscriber.
   
  
 
@@ -2202,9 +2202,9 @@ description | Waiting for a newly initialized WAL file to 
reach durable storage
update_exists_count bigint
   
   
-   Number of times that the updated value of a row violated a
-   NOT DEFERRABLE unique constraint while applying
-   changes
+   Number of times that an updated row value violated a
+   NOT DEFERRABLE unique constraint during the
+   application of changes
   
  
 
@@ -2213,8 +2213,8 @@ description | Waiting for a newly initialized WAL file to 
reach durable storage
update_missing_count bigint
   
   
-   Number of times that the tuple to be updated was not found while 
applying
-   changes
+   Number of times the tuple to be updated was not found during the
+   application of changes
   
  
 
@@ -2223,11 +2223,11 @@ description | Waiting for a newly initialized WAL file 
to reach durable storage
delete_differ_count bigint
   
   
-   Number of times a delete was performed on a row that was previously
-   modified by another source while applying changes. This conflict is
-   counted only when 

Re: Conflict Detection and Resolution

2024-08-25 Thread Peter Smith
On Thu, Aug 22, 2024 at 8:15 PM shveta malik  wrote:
>
> On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond  wrote:
> >
> > The patches have been rebased on the latest pgHead following the merge
> > of the conflict detection patch [1].
>
> Thanks for working on patches.
>
> Summarizing the issues which need some suggestions/thoughts.
>
> 1)
> For subscription based resolvers, currently the syntax implemented is:
>
> 1a)
> CREATE SUBSCRIPTION 
> CONNECTION  PUBLICATION 
> CONFLICT RESOLVER
> (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> 1b)
> ALTER SUBSCRIPTION  CONFLICT RESOLVER
> (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> Earlier the syntax suggested in [1] was:
> CREATE SUBSCRIPTION  CONNECTION  PUBLICATION 
> CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1',
> CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2';
>
> I think the currently implemented syntax  is good as it has less
> repetition, unless others think otherwise.
>
> ~~
>
> 2)
> For subscription based resolvers, do we need a RESET command to reset
> resolvers to default? Any one of below or both?
>
> 2a) reset all at once:
>  ALTER SUBSCRIPTION  RESET CONFLICT RESOLVERS
>
> 2b) reset one at a time:
>  ALTER SUBSCRIPTION  RESET CONFLICT RESOLVER for 'conflict_type';
>
> The issue I see here is, to implement 1a and 1b, we have introduced
> the  'RESOLVER' keyword. If we want to implement 2a, we will have to
> introduce the 'RESOLVERS' keyword as well. But we can come up with
> some alternative syntax if we plan to implement these. Thoughts?
>

Hi Shveta,

I felt it would be better to keep the syntax similar to the existing
INSERT ... ON CONFLICT [1].

I'd suggest a syntax like this:

... ON CONFLICT ['conflict_type'] DO { 'conflict_action' | DEFAULT }

~~~

e.g.

To configure conflict resolvers for the SUBSCRIPTION:

CREATE SUBSCRIPTION subname CONNECTION coninfo PUBLICATION pubname
ON CONFLICT 'conflict_type1' DO 'conflict_action1',
ON CONFLICT 'conflict_type2' DO 'conflict_action2';

Likewise, for ALTER:

ALTER SUBSCRIPTION 
ON CONFLICT 'conflict_type1' DO 'conflict_action1',
ON CONFLICT 'conflict_type2' DO 'conflict_action2';

To RESET all at once:

ALTER SUBSCRIPTION 
ON CONFLICT DO DEFAULT;

And, to RESET one at a time:

ALTER SUBSCRIPTION 
ON CONFLICT 'conflict_type1' DO DEFAULT;

~~~

Although your list format "('conflict_type1' = 'conflict_action1',
'conflict_type2' = 'conflict_action2')"  is clear and without
repetition, I predict this terse style could end up being troublesome
because it does not offer much flexibility for whatever the future
might hold for CDR.

e.g. ability to handle the conflict with a user-defined resolver
e.g. ability to handle the conflict conditionally (e.g. with a WHERE clause...)
e.g. ability to handle all conflicts with a common resolver
etc.



Advantages of my suggestion:
- Close to existing SQL syntax
- No loss of clarity by removing the word "RESOLVER"
- No requirement for new keyword/s
- The commands now read more like English
- Offers more flexibility for any unknown future requirements
- The setup (via create subscription) and the alter/reset all look the same.

==
[1] https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-08-23 Thread Peter Smith
Hi Shubham,

I have reviewed v28* and posted updated v29 versions of patches 0001 and 0002.

If you are OK with these changes, the next task would be to pg_indent
them, then rebase the remaining patches (0003 etc.) and include those
with the next patchset version.

//

Patch v29-0001 changes:

nit - Fixed typo in comments.
nit - Removed an unnecessary format change for the unchanged
send_relation_and_attrs declaration.

//

Patch v29-0002 changes:

1.
Made fixes to address Vignesh's review comments [1].

2.
Added the missing test cases for tab_gen_to_gen, and tab_alter.

3.
Multiple other modifications include:
nit - Renamed the test database /test/test_igc_true/ because 'test'
was too vague.
nit - This patch does not need to change most of the existing 'tab1'
test. So we should not be reformating the existing test code for no
reason.
nit - I added a summary comment to describe the test combinations
nit - The "Testcase end" comments were unnecessary and prone to error,
so I removed them.
nit - Change comments /incremental sync/incremental replication/
nit - Added XXX notes about copy_data=false. These are reminders for
to change code in later TAP patches
nit - Rearranged test steps so the publisher does not do incremental
INSERT until all initial sync tests are done
nit - Added initial sync tests even if copy_data=false. This is for
completeness - these will be handled in a later TAP patch
nit - The table names are self-explanatory, so some of the test
"messages" were simplified

==
[1] 
https://www.postgresql.org/message-id/CALDaNm31LZQfeR8Vv1qNCOREGffvZbgGDrTp%3D3h%3DEHiHTEO2pQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v29-0001-Enable-support-for-include_generated_columns-opt.patch
Description: Binary data


v29-0002-Tap-tests-for-generated-columns.patch
Description: Binary data


Re: Conflict detection and logging in logical replication

2024-08-22 Thread Peter Smith
Hi Hou-san.

I was experimenting with some conflict logging and found that large
column values are truncated in the log DETAIL.

E.g. Below I have a table where I inserted a 3000 character text value
'bigbigbig..."

Then I caused a replication conflict.

test_sub=# delete fr2024-08-22 17:50:17.181 AEST [14901] LOG:  logical
replication apply worker for subscription "sub1" has started
2024-08-22 17:50:17.193 AEST [14901] ERROR:  conflict detected on
relation "public.t1": conflict=insert_exists
2024-08-22 17:50:17.193 AEST [14901] DETAIL:  Key already exists in
unique index "t1_pkey", modified in transaction 780.
Key (a)=(k3); existing local tuple (k3,
bigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigb...);
remote tuple (k3, this will clash).

~

Do you think the documentation for the 'column_value' parameter of the
conflict logging should say that the displayed value might be
truncated?

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: CREATE SUBSCRIPTION - add missing test case

2024-08-21 Thread Peter Smith
On Thu, Aug 22, 2024 at 8:54 AM Peter Smith  wrote:
>
> On Wed, Aug 21, 2024 at 8:48 PM Amit Kapila  wrote:
> >
> > On Fri, Aug 16, 2024 at 9:45 AM vignesh C  wrote:
> > >
> > > On Thu, 15 Aug 2024 at 12:55, Peter Smith  wrote:
> > > >
> > > > Hi Hackers,
> > > >
> > > > While reviewing another logical replication thread [1], I found an
> > > > ERROR scenario that seems to be untested.
> > > >
> > > > TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is
> > > > missing some expected column(s).
> > > >
> > > > Attached is a patch to add the missing test for this error message.
> > >
> > > I agree currently there is no test to hit this code.
> > >
> >
> > I also don't see a test for this error condition. However, it is not
> > clear to me how important is it to cover this error code path. This
> > code has existed for a long time and I didn't notice any bugs related
> > to this. There is a possibility that in the future we might break
> > something because of a lack of this test but not sure if we want to
> > cover every code path via tests as each additional test also has some
> > cost. OTOH, If others think it is important or a good idea to have
> > this test then I don't have any objection to the same.
>
> Yes, AFAIK there were no bugs related to this; The test was proposed
> to prevent accidental future bugs.
>
> BACKGROUND
>
> Another pending feature thread (replication of generated columns) [1]
> required many test combinations to confirm all the different expected
> results which are otherwise easily accidentally broken without
> noticing. This *current* thread test shares one of the same error
> messages, which is how it was discovered missing in the first place.
>
> ~~~
>
> PROPOSAL
>
> I think this is not the first time a logical replication test has been
> questioned due mostly to concern about creeping "costs".
>
> How about we create a new test file and put test cases like this one
> into it, guarded by code like the below using PG_TEST_EXTRA [2]?
>
> Doing it this way we can have better code coverage and higher
> confidence when we want it, but zero test cost overheads when we don't
> want it.
>
> e.g.
>
> src/test/subscription/t/101_extra.pl:
>
> if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bsubscription\b/)
> {
> plan skip_all =>
>   'Due to execution costs these tests are skipped unless subscription
> is enabled in PG_TEST_EXTRA';
> }
>
> # Add tests here...
>

To help strengthen the above proposal, here are a couple of examples
where TAP tests already use this strategy to avoid tests for various
reasons.

[1] Avoids some test because of cost
# WAL consistency checking is resource intensive so require opt-in with the
# PG_TEST_EXTRA environment variable.
if (   $ENV{PG_TEST_EXTRA}
&& $ENV{PG_TEST_EXTRA} =~ m/\bwal_consistency_checking\b/)
{
$node_primary->append_conf('postgresql.conf',
'wal_consistency_checking = all');
}

[2] Avoids some tests because of safety
if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/)
{
plan skip_all =>
  'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA';
}

==
[1] 
https://github.com/postgres/postgres/blob/master/src/test/recovery/t/027_stream_regress.pl
[2] 
https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/t/004_load_balance_dns.pl

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Conflict detection and logging in logical replication

2024-08-21 Thread Peter Smith
HI Hous-San,. Here is my review of the v20-0001 docs patch.

1. Restructure into sections

> I think that's a good idea. But I preferred to do that in a separate
> patch(maybe a third patch after the first and second are RFC), because AFAICS
> we would need to adjust some existing docs which falls outside the scope of
> the current patch.

OK. I thought deferring it would only make extra work/churn, given you
already know up-front that everything will require restructuring later
anyway.

~~~

2. Synopsis

2.1 synopsis wrapping.

> I thought about this, but wrapping the sentence would cause the words
> to be displayed in different lines after compiling. I think that's 
> inconsistent
> with the real log which display the tuples in one line.

IMO the readability of the format is the most important objective for
the documentation. And, as you told Shveta, there is already a real
example where people can see the newlines if they want to.

nit - Anyway, FYI there is are newline rendering problems here in v20.
Removed newlines to make all these optional parts appear on the same
line.

2.2 other stuff

nit - Add underscore to /detailed explanation/detailed_explanation/,
to make it more obvious this is a replacement parameter

nit - Added newline after  for readabilty of the SGML file.

~~~

3. Case of literals

It's not apparent to me why the optional "Key" part should be
uppercase in the LOG but other (equally important?) literals of other
parts like "replica identity" are not.

It seems inconsistent.

~~~

4. LOG parts

nit - IMO the "schema.tablename" and the "conflict_type" deserved to
have separate listitems.

nit - The "conflict_type" should have  markup.

~~~

5. DETAIL parts

nit - added newline above this  for readability of the SGML.

nit - Add underscore to detailed_explanation, and rearrange wording to
name the parameter up-front same as the other bullets do.

nit - change the case /key/Key/ to match the synopsis.

~~~

6.
+
+ The replica identity section includes the replica
+ identity key values that used to search for the existing
local tuple to
+ be updated or deleted. This may include the full tuple value
if the local
+ relation is marked with REPLICA IDENTITY FULL.
+

It might be good to also provide a link for that REPLICA IDENTITY
FULL. (I did this already in the attachment as an example)

~~~

7. Replacement parameters - column_name, column_value

I've included these for completeness. I think it is useful.

BTW, the column names seem sometimes optional but I did not know the
rules. It should be documented what makes these names be shown or not
shown.

~~~

Please see the attachment which implements most of the items mentioned above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/logical-replication.sgml 
b/doc/src/sgml/logical-replication.sgml
index 3df791a..a3a0eae 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1670,11 +1670,10 @@ test_sub=# SELECT * FROM t1 ORDER BY id;
The log format for logical replication conflicts is as follows:
 
 LOG:  conflict detected on relation 
"schemaname.tablename": 
conflict=conflict_type
-DETAIL:  detailed explanation.
-
-Key (column_name, 
...)=(column_value, ...)
-; existing local tuple 
(column_name, 
...)=(column_value, 
...); remote tuple 
(column_name, 
...)=(column_value, 
...); replica identity 
{(column_name, 
...)=(column_value, ...) | full 
(column_value, ...)}.
+DETAIL:  detailed_explanation.
+Key (column_name, 
...)=(column_value, ...); 
existing local tuple 
(column_name, 
...)=(column_value, 
...); remote tuple 
(column_name, 
...)=(column_value, 
...); replica identity 
{(column_name, 
...)=(column_value, ...) | full 
(column_value, ...)}.
 
+
The log provides the following information:

 
@@ -1683,28 +1682,34 @@ DETAIL:  detailed 
explanation.

 
  
- The name of the local relation involved in the conflict and the 
conflict
- type (e.g., insert_exists,
- update_exists).
+ 
schemaname.tablename
+ identifies the local relation involved in the conflict.
+ 
+
+
+ 
+ conflict_type is the type of conflict that 
occurred
+ (e.g., insert_exists, 
update_exists).
  
 

   
 
+
 
  DETAIL
   
   

 
- The origin, transaction ID, and commit timestamp of the transaction 
that
- modified the existing local tuple, if available, are included in the
- detailed explanation.
+ detailed_explanation 
includes
+ the origin, transaction ID, and commit timestamp of the transaction 
that
+ modified the existing local tuple, if available.
 


 
- The key secti

Re: CREATE SUBSCRIPTION - add missing test case

2024-08-21 Thread Peter Smith
On Wed, Aug 21, 2024 at 8:48 PM Amit Kapila  wrote:
>
> On Fri, Aug 16, 2024 at 9:45 AM vignesh C  wrote:
> >
> > On Thu, 15 Aug 2024 at 12:55, Peter Smith  wrote:
> > >
> > > Hi Hackers,
> > >
> > > While reviewing another logical replication thread [1], I found an
> > > ERROR scenario that seems to be untested.
> > >
> > > TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is
> > > missing some expected column(s).
> > >
> > > Attached is a patch to add the missing test for this error message.
> >
> > I agree currently there is no test to hit this code.
> >
>
> I also don't see a test for this error condition. However, it is not
> clear to me how important is it to cover this error code path. This
> code has existed for a long time and I didn't notice any bugs related
> to this. There is a possibility that in the future we might break
> something because of a lack of this test but not sure if we want to
> cover every code path via tests as each additional test also has some
> cost. OTOH, If others think it is important or a good idea to have
> this test then I don't have any objection to the same.

Yes, AFAIK there were no bugs related to this; The test was proposed
to prevent accidental future bugs.

BACKGROUND

Another pending feature thread (replication of generated columns) [1]
required many test combinations to confirm all the different expected
results which are otherwise easily accidentally broken without
noticing. This *current* thread test shares one of the same error
messages, which is how it was discovered missing in the first place.

~~~

PROPOSAL

I think this is not the first time a logical replication test has been
questioned due mostly to concern about creeping "costs".

How about we create a new test file and put test cases like this one
into it, guarded by code like the below using PG_TEST_EXTRA [2]?

Doing it this way we can have better code coverage and higher
confidence when we want it, but zero test cost overheads when we don't
want it.

e.g.

src/test/subscription/t/101_extra.pl:

if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bsubscription\b/)
{
plan skip_all =>
  'Due to execution costs these tests are skipped unless subscription
is enabled in PG_TEST_EXTRA';
}

# Add tests here...

==
[1] 
https://www.postgresql.org/message-id/flat/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E%40gmail.com
[2] https://www.postgresql.org/docs/devel/regress-run.html#REGRESS-ADDITIONAL

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Conflict detection and logging in logical replication

2024-08-20 Thread Peter Smith
Here are some review comments for the v19-0001 docs patch.

The content seemed reasonable, but IMO it should be presented quite differently.



1. Use sub-sections

I expect this logical replication "Conflicts" section is going to
evolve into something much bigger. Surely, it's not going to be one
humongous page of details, so it will be a section with lots of
subsections like all the other in Chapter 29.

IMO, you should be writing the docs in that kind of structure from the
beginning.

For example, I'm thinking something like below (this is just an
example - surely lots more subsections will be needed for this topic):

29.6 Conflicts
29.6.1. Conflict types
29.6.2. Logging format
29.6.3. Examples

Specifically, this v19-0001 patch information should be put into a
subsection like the 29.6.2 shown above.

~~~

2. Markup

+
+LOG:  conflict detected on relation "schemaname.tablename":
conflict=conflict_type
+DETAIL:  detailed explaination.
+Key (column_name, ...)=(column_name, ...);
existing local tuple (column_name,
...)=(column_name, ...); remote tuple (column_name,
...)=(column_name, ...); replica identity
(column_name, ...)=(column_name, ...).
+

IMO this should be using markup more like the SQL syntax references.
- e.g. I suggest  instead of 
- e.g. I suggest all the substitution parameters (e.g. detailed
explanation, conflict_type, column_name, ...) in the log should use
 and use those markups again later in
these docs instead of 

~

nit - typo /explaination/explanation/

~

nit - The amount of scrolling needed makes this LOG format too hard to
see. Try to wrap it better so it can fit without being so wide.

~~~

3. Restructure the list

+   
+

I suggest restructuring all this to use a nested list like:

LOG
- conflict_type
DETAIL
- detailed_explanation
- key
- existing_local_tuple
- remote_tuple
- replica_identity

Doing this means you can remove a great deal of the unnecessary junk
words like "of the first sentence in the DETAIL", and "sentence of the
DETAIL line" etc. The result will be much less text but much simpler
text too.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-08-20 Thread Peter Smith
Hi Vignesh, Here are my only review comments for the latest patch set.

v20240820-0003.

nit - missing period for comment in FetchRelationStates
nit - typo in function name 'ProcessSyncingTablesFoSync'

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/logical/syncutils.c 
b/src/backend/replication/logical/syncutils.c
index b8f9300..705a330 100644
--- a/src/backend/replication/logical/syncutils.c
+++ b/src/backend/replication/logical/syncutils.c
@@ -96,7 +96,7 @@ SyncProcessRelations(XLogRecPtr current_lsn)
break;
 
case WORKERTYPE_TABLESYNC:
-   ProcessSyncingTablesFoSync(current_lsn);
+   ProcessSyncingTablesForSync(current_lsn);
break;
 
case WORKERTYPE_APPLY:
@@ -143,7 +143,7 @@ FetchRelationStates(bool *started_tx)
*started_tx = true;
}
 
-   /* Fetch tables that are in non-ready state */
+   /* Fetch tables that are in non-ready state. */
rstates = GetSubscriptionRelations(MySubscription->oid, true);
 
/* Allocate the tracking info in a permanent memory context. */
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index ad92b84..c753f45 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -237,7 +237,7 @@ wait_for_worker_state_change(char expected_state)
  * SYNCDONE and finish.
  */
 void
-ProcessSyncingTablesFoSync(XLogRecPtr current_lsn)
+ProcessSyncingTablesForSync(XLogRecPtr current_lsn)
 {
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
 
diff --git a/src/include/replication/worker_internal.h 
b/src/include/replication/worker_internal.h
index 24f74ab..6504b70 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -264,7 +264,7 @@ extern void UpdateTwoPhaseState(Oid suboid, char new_state);
 
 extern bool FetchRelationStates(bool *started_tx);
 extern bool WaitForRelationStateChange(Oid relid, char expected_state);
-extern void ProcessSyncingTablesFoSync(XLogRecPtr current_lsn);
+extern void ProcessSyncingTablesForSync(XLogRecPtr current_lsn);
 extern void ProcessSyncingTablesForApply(XLogRecPtr current_lsn);
 extern void SyncProcessRelations(XLogRecPtr current_lsn);
 extern void SyncInvalidateRelationStates(Datum arg, int cacheid,


Re: GUC names in messages

2024-08-20 Thread Peter Smith
CFBot reported some failures, so I have attached the rebased patch set v9*.

I'm hopeful the majority of these might be pushed to avoid more rebasing...

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v9-0001-Add-quotes-for-GUCs-bool.patch
Description: Binary data


v9-0005-Add-quotes-for-GUCs-enum.patch
Description: Binary data


v9-0003-Add-quotes-for-GUCs-real.patch
Description: Binary data


v9-0004-Add-quotes-for-GUCs-string.patch
Description: Binary data


v9-0002-Add-quotes-for-GUCs-int.patch
Description: Binary data


v9-0008-GUC-names-make-common-translatable-messages.patch
Description: Binary data


v9-0006-GUC-names-fix-case-intervalstyle.patch
Description: Binary data


v9-0007-GUC-names-fix-case-datestyle.patch
Description: Binary data


Re: PG docs - Sequence CYCLE clause

2024-08-19 Thread Peter Smith
On Tue, Aug 20, 2024 at 10:18 AM Bruce Momjian  wrote:
>
> Great, patch applied to master.
>

Thanks for pushing!

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: CREATE SUBSCRIPTION - add missing test case

2024-08-19 Thread Peter Smith
On Fri, Aug 16, 2024 at 2:15 PM vignesh C  wrote:
>

Thanks for the review.

>
> I agree currently there is no test to hit this code. I'm not sure if
> this is the correct location for the test, should it be included in
> the 008_diff_schema.pl file?

Yes, that is a better home for this test. Done as suggested in
attached patch v2.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v2-0001-Add-missing-test-case.patch
Description: Binary data


Re: Logical Replication of sequences

2024-08-19 Thread Peter Smith
Hi Vignesh, Here are my review comments for the latest patchset

v20240819-0001. No changes. No comments.
v20240819-0002. No changes. No comments.
v20240819-0003. See below.
v20240819-0004. See below.
v20240819-0005. No changes. No comments.

///

PATCH v20240819-0003

==
src/backend/replication/logical/syncutils.c

3.1.
+typedef enum
+{
+ SYNC_RELATION_STATE_NEEDS_REBUILD,
+ SYNC_RELATION_STATE_REBUILD_STARTED,
+ SYNC_RELATION_STATE_VALID,
+} SyncingRelationsState;
+
+static SyncingRelationsState relation_states_validity =
SYNC_RELATION_STATE_NEEDS_REBUILD;

There is some muddle of singular/plural names here. The
typedef/values/var should all match:

e.g. It could be like:
SYNC_RELATION_STATE_xxx --> SYNC_RELATION_STATES_xxx
SyncingRelationsState --> SyncRelationStates

But, a more radical change might be better.

typedef enum
{
RELATION_STATES_SYNC_NEEDED,
RELATION_STATES_SYNC_STARTED,
RELATION_STATES_SYNCED,
} SyncRelationStates;

~~~

3.2. GENERAL refactoring

I don't think all of the functions moved into syncutil.c truly belong there.

This new module was introduced to be for common/util functions for
tablesync and sequencesync, but with each patchset, it has been
sucking in more and more functions that maybe do not quite belong
here.

For example, AFAIK these below have logic that is *solely* for TABLES
(not for SEQUENCES). Perhaps it was convenient to dump them here
because they are statically called, but I felt they still logically
belong in tablesync.c:
- process_syncing_tables_for_sync(XLogRecPtr current_lsn)
- process_syncing_tables_for_apply(XLogRecPtr current_lsn)
- AllTablesyncsReady(void)

~~~

3.3.
+static bool
+FetchRelationStates(bool *started_tx)
+{

If this function can remain static then the name should change to be
like fetch_table_states, right?

==
src/include/replication/worker_internal.h

3.4.
+extern bool wait_for_relation_state_change(Oid relid, char expected_state);

If this previously static function will be exposed now (it may not
need to be if some other functions are returned tablesync.c) then the
function name should also be changed, right?



PATCH v20240819-0004

==
src/backend/replication/logical/syncutils.c

4.1 GENERAL refactoring

(this is similar to review comment #3.2 above)

Functions like below have logic that is *solely* for SEQUENCES (not
for TABLES). I felt they logically belong in sequencesync.c, not here.
- process_syncing_sequences_for_apply(void)

~~~

FetchRelationStates:
nit - the comment change about "not-READY tables" (instead of
relations) should be already in patch 0003.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-08-19 Thread Peter Smith
Hi Shubham, here are my review comments for the TAP tests patch v27-0002

==
Commit message

Tap tests for 'include-generated-columns'

~

But, it's more than that-- these are the TAP tests for all
combinations of replication related to generated columns. i.e. both
with and without 'include_generated_columns' option enabled.

==
src/test/subscription/t/011_generated.pl

I was mistaken, thinking that the v27-0002 had already been refactored
according to Vignesh's last review but it is not done yet, so I am not
going to post detailed review comments until the restructuring is
completed.

~

OTOH, there are some problems I felt have crept into v26-0001 (TAP
test is same as v27-0002), so maybe try to also take care of them (see
below) in v28-0002.

In no particular order:

* I felt it is almost useless now to have the "combo" (
"regress_pub_combo")  publication. It used to have many tables when
you first created it but with every version posted it is publishing
less and less so now there are only 2 tables in it. Better to have a
specific publication for each table now and forget about "combos"

* The "TEST tab_gen_to_gen initial sync" seems to be not even checking
the table data. Why not? e.g. Even if you expect no data, you should
test for it.

* The "TEST tab_gen_to_gen replication" seems to be not even checking
the table data. Why not?

* Multiple XXX comments like "... it needs more study to determine if
the above result was actually correct, or a PG17 bug..." should be
removed. AFAIK we should well understand the expected results for all
combinations by now.

* The "TEST tab_order replication" is now getting an error saying
, Now, that may now be the correct
error for this situation, but in that case, then I think the test is
not longer testing what it was intended to test (i.e. that column
order does not matter) Probably the table definition needs
adjusting to make sure we are testing whenwe want to test, and not
just making some random scenario "PASS".

* The test "# TEST tab_alter" expected empty result also seems
unhelpful. It might be related to the previous bullet.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-08-18 Thread Peter Smith
Hi, Here are my review comments for v27-0001.

==
contrib/test_decoding/expected/generated_columns.out
contrib/test_decoding/sql/generated_columns.sql

+-- By default, 'include-generated-columns' is enabled, so the values
for the generated column 'b' will be replicated even if it is not
explicitly specified.

nit - The "default" is only like this for "test_decoding" (e.g., the
CREATE SUBSCRIPTION option is the opposite), so let's make the comment
clearer about that.
nit - Use sentence case in the comments.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/contrib/test_decoding/expected/generated_columns.out 
b/contrib/test_decoding/expected/generated_columns.out
index 48f900f..9b03f6d 100644
--- a/contrib/test_decoding/expected/generated_columns.out
+++ b/contrib/test_decoding/expected/generated_columns.out
@@ -1,13 +1,14 @@
--- test decoding of generated columns
+-- Test decoding of generated columns.
 SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
'test_decoding');
  ?column? 
 --
  init
 (1 row)
 
--- column b' is a generated column
+-- Column b' is a generated column.
 CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
--- By default, 'include-generated-columns' is enabled, so the values for the 
generated column 'b' will be replicated even if it is not explicitly specified.
+-- For 'test_decoding' the parameter 'include-generated-columns' is enabled by 
default,
+-- so the values for the generated column 'b' will be replicated even if it is 
not explicitly specified.
 INSERT INTO gencoltable (a) VALUES (1);
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1');
 data 
@@ -17,7 +18,7 @@ SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
  COMMIT
 (3 rows)
 
--- when 'include-generated-columns' is enabled, the values of the generated 
column 'b' will be replicated.
+-- When 'include-generated-columns' is enabled, the values of the generated 
column 'b' will be replicated.
 INSERT INTO gencoltable (a) VALUES (2);
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1');
 data 
@@ -27,7 +28,7 @@ SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
  COMMIT
 (3 rows)
 
--- when 'include-generated-columns' is disabled, the values of the generated 
column 'b' will not be replicated.
+-- When 'include-generated-columns' is disabled, the values of the generated 
column 'b' will not be replicated.
 INSERT INTO gencoltable (a) VALUES (3);
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '0');
   data  
@@ -37,7 +38,7 @@ SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
  COMMIT
 (3 rows)
 
--- with REPLICA IDENTITY = FULL, to show old-key data includes generated 
columns data for updates.
+-- When REPLICA IDENTITY = FULL, show old-key data includes generated columns 
data for updates.
 ALTER TABLE gencoltable REPLICA IDENTITY FULL;
 UPDATE gencoltable SET a = 10;
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1');
diff --git a/contrib/test_decoding/sql/generated_columns.sql 
b/contrib/test_decoding/sql/generated_columns.sql
index fb156c2..7b455a1 100644
--- a/contrib/test_decoding/sql/generated_columns.sql
+++ b/contrib/test_decoding/sql/generated_columns.sql
@@ -1,23 +1,24 @@
--- test decoding of generated columns
+-- Test decoding of generated columns.
 
 SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
'test_decoding');
 
--- column b' is a generated column
+-- Column b' is a generated column.
 CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
 
--- By default, 'include-generated-columns' is enabled, so the values for the 
generated column 'b' will be replicated even if it is not explicitly specified.
+-- For 'test_decoding' the parameter 'include-generated-columns' is 

Re: Logical Replication of sequences

2024-08-18 Thread Peter Smith
Here are my review comments for the latest patchset

v20240817-0001. No changes. No comments.
v20240817-0002. No changes. No comments.
v20240817-0003. See below.
v20240817-0004. See below.
v20240817-0005. No changes. No comments.

//

v20240817-0003 and 0004.

(This is a repeat of the same comment as in previous reviews, but lots
more functions seem affected now)

IIUC, the LR code tries to follow function naming conventions (e.g.
CamelCase/snake_case for exposed/static functions respectively),
intended to make the code more readable. But, this only works if the
conventions are followed.

Now, patches 0003 and 0004 are shuffling more and more functions
between modules while changing them from static to non-static (or vice
versa). So, the function name conventions are being violated many
times. IMO these functions ought to be renamed according to their new
modifiers to avoid the confusion caused by ignoring the name
conventions.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-08-15 Thread Peter Smith
Hi Vignesh. I looked at the latest v20240815* patch set.

I have only the following few comments for patch v20240815-0004, below.

==
Commit message.

Please see the attachment for some suggested updates.

==
src/backend/commands/subscriptioncmds.c

CreateSubscription:
nit - fix wording in one of the XXX comments

==
.../replication/logical/sequencesync.c

report_mismatched_sequences:
nit - param name /warning_sequences/mismatched_seqs/

append_mismatched_sequences:
nit - param name /warning_sequences/mismatched_seqs/

LogicalRepSyncSequences:
nit - var name /warning_sequences/mismatched_seqs/

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 534d385..e799a41 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -800,9 +800,9 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
 * export it.
 *
 * XXX: If the subscription is for a sequence-only 
publication,
-* creating this slot. It can be created later during 
the ALTER
-* SUBSCRIPTION ... REFRESH command, if the publication 
is updated
-* to include tables or tables in schema.
+* creating this slot is unnecessary. It can be created 
later
+* during the ALTER SUBSCRIPTION ... REFRESH command, 
if the
+* publication is updated to include tables or tables 
in schema.
 */
if (opts.create_slot)
{
diff --git a/src/backend/replication/logical/sequencesync.c 
b/src/backend/replication/logical/sequencesync.c
index 1aa5ab8..934646f 100644
--- a/src/backend/replication/logical/sequencesync.c
+++ b/src/backend/replication/logical/sequencesync.c
@@ -264,17 +264,17 @@ copy_sequence(WalReceiverConn *conn, Relation rel,
  * Report any sequence mismatches as a single warning log.
  */
 static void
-report_mismatched_sequences(StringInfo warning_sequences)
+report_mismatched_sequences(StringInfo mismatched_seqs)
 {
-   if (warning_sequences->len)
+   if (mismatched_seqs->len)
{
ereport(WARNING,

errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("parameters differ for the remote and 
local sequences (%s) for subscription \"%s\"",
-  warning_sequences->data, 
MySubscription->name),
+  mismatched_seqs->data, 
MySubscription->name),
errhint("Alter/Re-create local sequences to 
have the same parameters as the remote sequences."));
 
-   resetStringInfo(warning_sequences);
+   resetStringInfo(mismatched_seqs);
}
 }
 
@@ -282,15 +282,15 @@ report_mismatched_sequences(StringInfo warning_sequences)
  * append_mismatched_sequences
  *
  * Appends details of sequences that have discrepancies between the publisher
- * and subscriber to the warning_sequences string.
+ * and subscriber to the mismatched_seqs string.
  */
 static void
-append_mismatched_sequences(StringInfo warning_sequences, Relation seqrel)
+append_mismatched_sequences(StringInfo mismatched_seqs, Relation seqrel)
 {
-   if (warning_sequences->len)
-   appendStringInfoString(warning_sequences, ", ");
+   if (mismatched_seqs->len)
+   appendStringInfoString(mismatched_seqs, ", ");
 
-   appendStringInfo(warning_sequences, "\"%s.%s\"",
+   appendStringInfo(mismatched_seqs, "\"%s.%s\"",
 
get_namespace_name(RelationGetNamespace(seqrel)),
 RelationGetRelationName(seqrel));
 }
@@ -314,7 +314,7 @@ LogicalRepSyncSequences(void)
boolstart_txn = true;
Oid subid = MyLogicalRepWorker->subid;
MemoryContext oldctx;
-   StringInfo  warning_sequences = makeStringInfo();
+   StringInfo  mismatched_seqs = makeStringInfo();
 
 /*
  * Synchronizing each sequence individually incurs overhead from starting
@@ -425,15 +425,15 @@ LogicalRepSyncSequences(void)
PG_CATCH();
{
if (sequence_mismatch)
-   append_mismatched_sequences(warning_sequences, 
sequence_rel);
+   append_mismatched_sequences(mismatched_seqs, 
sequence_rel);
 
-   report_mismatched_sequences(warning_sequences);
+   report_mismatched_sequences(mismatched_seqs);
 

Re: Pgoutput not capturing the generated columns

2024-08-15 Thread Peter Smith
On Tue, Jul 23, 2024 at 9:23 AM Peter Smith  wrote:
>
> On Fri, Jul 19, 2024 at 4:01 PM Shlok Kyal  wrote:
> >
> > On Thu, 18 Jul 2024 at 13:55, Peter Smith  wrote:
> > >
> > > Hi, here are some review comments for v19-0002
> > > ==
> > > src/test/subscription/t/004_sync.pl
> > >
> > > 1.
> > > This new test is not related to generated columns. IIRC, this is just
> > > some test that we discovered missing during review of this thread. As
> > > such, I think this change can be posted/patched separately from this
> > > thread.
> > >
> > I have removed the test for this thread.
> >
> > I have also addressed the remaining comments for v19-0002 patch.
>
> Hi, I have no more review comments for patch v20-0002 at this time.
>
> I saw that the above test was removed from this thread as suggested,
> but I could not find that any new thread was started to propose this
> valuable missing test.
>

I still did not find any new thread for adding the missing test case,
so I started one myself [1].

==
[1] 
https://www.postgresql.org/message-id/CAHut+PtX8P0EGhsk9p=hqguhrzxecszanxsmkovyilx-ejd...@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




CREATE SUBSCRIPTION - add missing test case

2024-08-15 Thread Peter Smith
Hi Hackers,

While reviewing another logical replication thread [1], I found an
ERROR scenario that seems to be untested.

TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is
missing some expected column(s).

Attached is a patch to add the missing test for this error message.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPt5FqV7J9GnnWFRNW_Z1KOMMAZXNTRcRNdtFrfMBz_GLA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Add-missing-test-case.patch
Description: Binary data


Re: Logical Replication of sequences

2024-08-14 Thread Peter Smith
Hi Vignesh, I have reviewed your latest patchset:

v20240814-0001. No comments
v20240814-0002. No comments
v20240814-0003. No comments
v20240814-0004. See below
v20240814-0005. No comments

//

v20240814-0004.

==
src/backend/commands/subscriptioncmds.c

CreateSubscription:
nit - XXX comments

AlterSubscription_refresh:
nit - unnecessary parens in ereport

AlterSubscription:
nit - unnecessary parens in ereport

fetch_sequence_list:
nit - unnecessary parens in ereport

==
.../replication/logical/sequencesync.c

1. fetch_remote_sequence_data

+ * Returns:
+ * - TRUE if there are discrepancies between the sequence parameters in
+ *   the publisher and subscriber.
+ * - FALSE if the parameters match.
+ */
+static bool
+fetch_remote_sequence_data(WalReceiverConn *conn, Oid relid, Oid remoteid,
+char *nspname, char *relname, int64 *log_cnt,
+bool *is_called, XLogRecPtr *page_lsn,
+int64 *last_value)

IMO it is more natural to return TRUE for good results and FALSE for
bad ones. (FYI, I have implemented this reversal in the nitpicks
attachment).

~

nit - swapped columns seqmin and seqmax in the SQL to fetch them in
the natural order
nit - unnecessary parens in ereport

~~~

copy_sequence:
nit - update function comment to document the output parameter
nit - Assert that *sequence_mismatch is false on entry to this function
nit - tweak wrapping and add \n in the SQL
nit - unnecessary parens in ereport

report_sequence_mismatch:
nit - modify function comment
nit - function name changed
/report_sequence_mismatch/report_mismatched_sequences/ (now plural
(and more like the other one)

append_mismatched_sequences:
nit - param name /rel/seqrel/

~~~

2. LogicalRepSyncSequences:
+ Relation sequence_rel;
+ XLogRecPtr sequence_lsn;
+ bool sequence_mismatch;

The 'sequence_mismatch' variable must be initialized false, otherwise
we cannot trust it gets assigned.

~

LogicalRepSyncSequences:
nit - unnecessary parens in ereport
nit - move the for-loop variable declaration
nit - remove a blank line

process_syncing_sequences_for_apply:
nit - variable declaration indent

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 9fff288..22115bd 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -726,10 +726,10 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
recordDependencyOnOwner(SubscriptionRelationId, subid, owner);
 
/*
-* XXX: If the subscription is for a sequence-only publication,
-* creating this origin is unnecessary at this point. It can be created
-* later during the ALTER SUBSCRIPTION ... REFRESH command, if the
-* publication is updated to include tables or tables in schemas.
+* XXX: If the subscription is for a sequence-only publication, creating
+* this origin is unnecessary. It can be created later during the ALTER
+* SUBSCRIPTION ... REFRESH command, if the publication is updated to
+* include tables or tables in schemas.
 */
ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname, 
sizeof(originname));
replorigin_create(originname);
@@ -800,9 +800,9 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
 * export it.
 *
 * XXX: If the subscription is for a sequence-only 
publication,
-* creating this slot is not necessary at the moment. 
It can be
-* created during the ALTER SUBSCRIPTION ... REFRESH 
command if the
-* publication is updated to include tables or tables 
in schema.
+* creating this slot. It can be created later during 
the ALTER
+* SUBSCRIPTION ... REFRESH command, if the publication 
is updated
+* to include tables or tables in schema.
 */
if (opts.create_slot)
{
@@ -1021,9 +1021,9 @@ AlterSubscription_refresh(Subscription *sub, bool 
copy_data,

copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,

InvalidXLogRecPtr, true);
ereport(DEBUG1,
-   (errmsg_internal("%s \"%s.%s\" 
added to subscription \"%s\"",
-   
 relkind == RELKIND_SEQUENCE ? "sequence" : "table",
-   
 rv->schemaname, rv->relname, sub->name)));
+  

Re: Logical Replication of sequences

2024-08-13 Thread Peter Smith
Hi Vignesh, Here are my review comments for the latest patchset:

Patch v20240813-0001. No comments
Patch v20240813-0002. No comments
Patch v20240813-0003. No comments
Patch v20240813-0004. See below
Patch v20240813-0005. No comments

//

Patch v20240813-0004

==
src/backend/catalog/pg_subscription.

GetSubscriptionRelations:
nit - modify a condition for readability

==
src/backend/commands/subscriptioncmds.c

fetch_sequence_list:
nit - changed the WARNING message. /parameters differ
between.../parameters differ for.../ (FYI, Chat-GPT agrees that 2nd
way is more correct)
nit - other minor changes to the message and hint

==
.../replication/logical/sequencesync.c

1. LogicalRepSyncSequences

+ ereport(DEBUG1,
+ errmsg("logical replication synchronization for subscription \"%s\",
sequence \"%s\" has finished",
+get_subscription_name(subid, false), get_rel_name(done_seq->relid)));

DEBUG logs should use errmsg_internal. (fixed also nitpicks attachment).

~

nit - minor change to the log message counting the batched sequences

~~~

process_syncing_sequences_for_apply:
nit - /sequence sync worker/seqeuencesync worker/

==
src/backend/utils/misc/guc_tables.c

nit - /max workers/maximum number of workers/ (for consistency because
all other GUCs are verbose like this; nothing just says "max".)

==
src/test/subscription/t/034_sequences.pl

nit - adjust the expected WARNING message (which was modified above)

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/catalog/pg_subscription.c 
b/src/backend/catalog/pg_subscription.c
index d938e57..af2bfe1 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -565,9 +565,11 @@ GetSubscriptionRelations(Oid subid, bool get_tables, bool 
get_sequences,
relkind = get_rel_relkind(subrel->srrelid);
 
/* Skip sequences if they were not requested */
-   if ((relkind == RELKIND_SEQUENCE && !get_sequences) ||
-   /* Skip tables if they were not requested */
-   (relkind != RELKIND_SEQUENCE && !get_tables))
+   if (relkind == RELKIND_SEQUENCE && !get_sequences)
+   continue;
+
+   /* Skip tables if they were not requested */
+   if (relkind != RELKIND_SEQUENCE && !get_tables)
continue;
 
relstate = (SubscriptionRelState *) 
palloc(sizeof(SubscriptionRelState));
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 4166210..7d0be40 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -2577,9 +2577,9 @@ fetch_sequence_list(WalReceiverConn *wrconn, char 
*subname, List *publications)
 */
ereport(WARNING,

errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-   errmsg("parameters differ between the remote 
and local sequences %s for subscription \"%s\"",
+   errmsg("parameters differ for the remote and 
local sequences (%s) for subscription \"%s\"",
   warning_sequences->data, subname),
-   errhint("Alter/Re-create the sequence using the 
same parameter as in remote."));
+   errhint("Alter/Re-create local sequences to 
have the same parameters as the remote sequences."));
pfree(warning_sequences);
}
 
diff --git a/src/backend/replication/logical/sequencesync.c 
b/src/backend/replication/logical/sequencesync.c
index aaedba9..e2e0421 100644
--- a/src/backend/replication/logical/sequencesync.c
+++ b/src/backend/replication/logical/sequencesync.c
@@ -342,12 +342,12 @@ LogicalRepSyncSequences(void)
done_seq = (SubscriptionRelState *) 
lfirst(list_nth_cell(sequences_not_synced, i));
 
ereport(DEBUG1,
-   errmsg("logical replication 
synchronization for subscription \"%s\", sequence \"%s\" has finished",
+   errmsg_internal("logical 
replication synchronization for subscription \"%s\", sequence \"%s\" has 
finished",
   
get_subscription_name(subid, false), get_rel_name(done_seq->relid)));
}
 
ereport(LOG,
-   errmsg("logical replication 
synchronized %d sequences of %d sequences for subscription \"%s\" ",
+   errmsg("logical replication 
synch

Re: Logical Replication of sequences

2024-08-13 Thread Peter Smith
On Tue, Aug 13, 2024 at 10:00 PM vignesh C  wrote:
>
> On Tue, 13 Aug 2024 at 09:19, Peter Smith  wrote:
> >
> > 3.1. GENERAL
> >
> > Hmm. I am guessing this was provided as a separate patch to aid review
> > by showing that existing functions are moved? OTOH you can't really
> > judge this patch properly without already knowing details of what will
> > come next in the sequencesync. i.e. As a *standalone* patch without
> > the sequencesync.c the refactoring doesn't make much sense.
> >
> > Maybe it is OK later to combine patches 0003 and 0004. Alternatively,
> > keep this patch separated but give greater emphasis in the comment
> > header to say this patch only exists separately in order to help the
> > review.
>
> I have kept this patch only to show that this patch as such has no
> code changes. If we move this to the next patch it will be difficult
> for reviewers to know which is new code and which is old code. During
> commit we can merge this with the next one. I felt it is better to add
> it in the commit message instead of comment header so updated the
> commit message.
>

Yes, I wrote "comment header" but it was a typo; I meant "commit
header". What you did looks good now. Thanks.

> > ~
> >
> > 3.4. function names
> >
> > With the re-shuffling that this patch does, and changing several from
> > static to not-static, should the function names remain as they are?
> > They look random to me.
> > - finish_sync_worker(void)
> > - invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
> > - FetchTableStates(bool *started_tx)
> > - process_syncing_tables(XLogRecPtr current_lsn)
> >
> > I think using a consistent naming convention would be better. e.g.
> > SyncFinishWorker
> > SyncInvalidateTableStates
> > SyncFetchTableStates
> > SyncProcessTables
>
> One advantage with keeping the existing names the same wherever
> possible will help while merging the changes to back-branches. So I'm
> not making this change.
>

According to my understanding, the logical replication code tries to
maintain name conventions for static functions (snake_case) and for
non-static functions (CamelCase) as an aid for code readability. I
think we should either do our best to abide by those conventions, or
we might as well just forget them and have a naming free-for-all.
Since the new syncutils.c module is being introduced by this patch, my
guess is that any future merging to back-branches will be affected
regardless. IMO this is an ideal opportunity to try to nudge the
function names in the right direction. YMMV.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-08-13 Thread Peter Smith
" has
finished
2024-08-13 16:41:47.436 AEST [11735] LOG:  logical replication
synchronization for subscription "sub3", sequence "seq_0566" has
finished
2024-08-13 16:41:47.436 AEST [11735] LOG:  logical replication
synchronization for subscription "sub3", sequence "seq_0568" has
finished
2024-08-13 16:41:47.436 AEST [11735] LOG:  logical replication
synchronization for subscription "sub3", sequence "seq_0569" has
finished
2024-08-13 16:41:47.436 AEST [11735] LOG:  logical replication
synchronization for subscription "sub3", sequence "seq_0570" has
finished
2024-08-13 16:41:47.436 AEST [11735] LOG:  logical replication
synchronization for subscription "sub3", sequence "seq_0571" has
finished
2024-08-13 16:41:47.436 AEST [11735] LOG:  logical replication
synchronization for subscription "sub3", sequence "seq_0582" has
finished
...

Is there a way to refresh sequences in a more natural (e.g.
alphabetical) order to make these logs more readable?

==
Kind Regards,
Peter Smith.
Fujitsu Australia




PG docs - Sequence CYCLE clause

2024-08-12 Thread Peter Smith
Hi hackers.

While reviewing another thread I had cause to be looking more
carefully at the SEQUENCE documentation.

I found it curious that, unlike other clauses, the syntax of the CYCLE
clause was not displayed on a line by itself.

I have changed that, and at the same time I have moved the CYCLE
syntax (plus its description) to be adjacent to MINVALUE/MAXVALUE,
which IMO is where it naturally belongs.

Please see the attached patch.

Thoughts?

==
Kind Regards,
Peter Smith.
Fujitsu Australia


docs_sequence_cycle.diff
Description: Binary data


Re: Logical Replication of sequences

2024-08-12 Thread Peter Smith
On Mon, Aug 12, 2024 at 11:07 PM vignesh C  wrote:
>
> On Mon, 12 Aug 2024 at 10:40, Peter Smith  wrote:
> >
> > Hi Vignesh,
> >
> > I found that when 2 subscriptions are both subscribing to a
> > publication publishing sequences, an ERROR occurs on refresh.
> >
> > ==
> >
> > Publisher:
> > --
> >
> > test_pub=# create publication pub1 for all sequences;
> >
> > Subscriber:
> > ---
> >
> > test_sub=# create subscription sub1 connection 'dbname=test_pub'
> > publication pub1;
> >
> > test_sub=# create subscription sub2 connection 'dbname=test_pub'
> > publication pub1;
> >
> > test_sub=# alter subscription sub1 refresh publication sequences;
> > 2024-08-12 15:04:04.947 AEST [7306] LOG:  sequence "public.seq1" of
> > subscription "sub1" set to INIT state
> > 2024-08-12 15:04:04.947 AEST [7306] STATEMENT:  alter subscription
> > sub1 refresh publication sequences;
> > 2024-08-12 15:04:04.947 AEST [7306] LOG:  sequence "public.seq1" of
> > subscription "sub1" set to INIT state
> > 2024-08-12 15:04:04.947 AEST [7306] STATEMENT:  alter subscription
> > sub1 refresh publication sequences;
> > 2024-08-12 15:04:04.947 AEST [7306] ERROR:  tuple already updated by self
> > 2024-08-12 15:04:04.947 AEST [7306] STATEMENT:  alter subscription
> > sub1 refresh publication sequences;
> > ERROR:  tuple already updated by self
> >
> > test_sub=# alter subscription sub2 refresh publication sequences;
> > 2024-08-12 15:04:30.427 AEST [7306] LOG:  sequence "public.seq1" of
> > subscription "sub2" set to INIT state
> > 2024-08-12 15:04:30.427 AEST [7306] STATEMENT:  alter subscription
> > sub2 refresh publication sequences;
> > 2024-08-12 15:04:30.427 AEST [7306] LOG:  sequence "public.seq1" of
> > subscription "sub2" set to INIT state
> > 2024-08-12 15:04:30.427 AEST [7306] STATEMENT:  alter subscription
> > sub2 refresh publication sequences;
> > 2024-08-12 15:04:30.427 AEST [7306] ERROR:  tuple already updated by self
> > 2024-08-12 15:04:30.427 AEST [7306] STATEMENT:  alter subscription
> > sub2 refresh publication sequences;
> > ERROR:  tuple already updated by self
>
> This issue is fixed in the v20240812 version attached at [1].
> [1] - 
> https://www.postgresql.org/message-id/CALDaNm3hS58W0RTbgsMTk-YvXwt956uabA%3DkYfLGUs3uRNC2Qg%40mail.gmail.com
>

Yes, I confirmed it is now fixed. Thanks!

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-08-12 Thread Peter Smith
Hi Vignesh, Here are my review comments for latest v20240812* patchset:

patch v20240812-0001. No comments.
patch v20240812-0002. Fixed docs.LGTM
patch v20240812-0003. This is new refactoring. See below.
patch v20240812-0004. (was 0003). See below.
patch v20240812-0005. (was 0004). No comments.

//

patch v20240812-0003.

3.1. GENERAL

Hmm. I am guessing this was provided as a separate patch to aid review
by showing that existing functions are moved? OTOH you can't really
judge this patch properly without already knowing details of what will
come next in the sequencesync. i.e. As a *standalone* patch without
the sequencesync.c the refactoring doesn't make much sense.

Maybe it is OK later to combine patches 0003 and 0004. Alternatively,
keep this patch separated but give greater emphasis in the comment
header to say this patch only exists separately in order to help the
review.

==
Commit message

3.2.
Reorganized tablesync code to generate a syncutils file which will
help in sequence synchronization worker code.

~

"generate" ??

==
src/backend/replication/logical/syncutils.c

3.3. "common code" ??

FYI - There are multiple code comments mentioning "common code..."
which, in the absence of the sequencesync worker (which comes in the
next patch), have nothing "common" about them at all. Fixing them and
then fixing them again in the next patch might cause unnecessary code
churn, but OTOH they aren't correct as-is either. I have left them
alone for now.

~

3.4. function names

With the re-shuffling that this patch does, and changing several from
static to not-static, should the function names remain as they are?
They look random to me.
- finish_sync_worker(void)
- invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
- FetchTableStates(bool *started_tx)
- process_syncing_tables(XLogRecPtr current_lsn)

I think using a consistent naming convention would be better. e.g.
SyncFinishWorker
SyncInvalidateTableStates
SyncFetchTableStates
SyncProcessTables

~~~

nit - file header comment

==
src/backend/replication/logical/tablesync.c

3.5.
-static void
+void
 process_syncing_tables_for_sync(XLogRecPtr current_lsn)

-static void
+void
 process_syncing_tables_for_apply(XLogRecPtr current_lsn)

Since these functions are no longer static should those function names
be changed to use the CamelCase convention for non-static API?

//

patch v20240812-0004.

==
src/backend/replication/logical/syncutils.c

nit - file header comment (made same as patch 0003)

~

FetchRelationStates:
nit - IIUC sequence states are only INIT -> READY. So the comments in
this function dont need to specifically talk about sequence INIT
state.

==
src/backend/utils/misc/guc_tables.c

4.1.
  {"max_sync_workers_per_subscription",
  PGC_SIGHUP,
  REPLICATION_SUBSCRIBERS,
- gettext_noop("Maximum number of table synchronization workers per
subscription."),
+ gettext_noop("Maximum number of relation synchronization workers per
subscription."),
  NULL,
  },

I was wondering if "relation synchronization workers" is meaningful to
the user because that seems like new terminology.
Maybe it should say "... of table + sequence synchronization workers..."

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/logical/syncutils.c 
b/src/backend/replication/logical/syncutils.c
index 4e39836..4bbc481 100644
--- a/src/backend/replication/logical/syncutils.c
+++ b/src/backend/replication/logical/syncutils.c
@@ -1,5 +1,5 @@
 /*-
- * sequencesync.c
+ * syncutils.c
  *   PostgreSQL logical replication: common synchronization code
  *
  * Copyright (c) 2024, PostgreSQL Global Development Group
@@ -8,8 +8,8 @@
  *   src/backend/replication/logical/syncutils.c
  *
  * NOTES
- *   This file contains common code for synchronization of tables that 
will be
- *   help apply worker and table synchronization worker.
+ *   This file contains code common to table synchronization workers, and
+ *   the sequence synchronization worker.
  *-
  */
 
diff --git a/src/backend/replication/logical/syncutils.c 
b/src/backend/replication/logical/syncutils.c
index ed353f1..1702be9 100644
--- a/src/backend/replication/logical/syncutils.c
+++ b/src/backend/replication/logical/syncutils.c
@@ -8,9 +8,8 @@
  *   src/backend/replication/logical/syncutils.c
  *
  * NOTES
- *   This file contains common code for synchronization of tables and
- *   sequences that will be help apply worker, table synchronization worker
- *and sequence synchronization.
+ *   This file contains code common to table synchronization workers, and
+ *the sequence synchronization worker.
  *

Re: Logical Replication of sequences

2024-08-11 Thread Peter Smith
Hi Vignesh,

I found that when 2 subscriptions are both subscribing to a
publication publishing sequences, an ERROR occurs on refresh.

==

Publisher:
--

test_pub=# create publication pub1 for all sequences;

Subscriber:
---

test_sub=# create subscription sub1 connection 'dbname=test_pub'
publication pub1;

test_sub=# create subscription sub2 connection 'dbname=test_pub'
publication pub1;

test_sub=# alter subscription sub1 refresh publication sequences;
2024-08-12 15:04:04.947 AEST [7306] LOG:  sequence "public.seq1" of
subscription "sub1" set to INIT state
2024-08-12 15:04:04.947 AEST [7306] STATEMENT:  alter subscription
sub1 refresh publication sequences;
2024-08-12 15:04:04.947 AEST [7306] LOG:  sequence "public.seq1" of
subscription "sub1" set to INIT state
2024-08-12 15:04:04.947 AEST [7306] STATEMENT:  alter subscription
sub1 refresh publication sequences;
2024-08-12 15:04:04.947 AEST [7306] ERROR:  tuple already updated by self
2024-08-12 15:04:04.947 AEST [7306] STATEMENT:  alter subscription
sub1 refresh publication sequences;
ERROR:  tuple already updated by self

test_sub=# alter subscription sub2 refresh publication sequences;
2024-08-12 15:04:30.427 AEST [7306] LOG:  sequence "public.seq1" of
subscription "sub2" set to INIT state
2024-08-12 15:04:30.427 AEST [7306] STATEMENT:  alter subscription
sub2 refresh publication sequences;
2024-08-12 15:04:30.427 AEST [7306] LOG:  sequence "public.seq1" of
subscription "sub2" set to INIT state
2024-08-12 15:04:30.427 AEST [7306] STATEMENT:  alter subscription
sub2 refresh publication sequences;
2024-08-12 15:04:30.427 AEST [7306] ERROR:  tuple already updated by self
2024-08-12 15:04:30.427 AEST [7306] STATEMENT:  alter subscription
sub2 refresh publication sequences;
ERROR:  tuple already updated by self

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-08-11 Thread Peter Smith
Hi Vignesh,

I noticed it is not currently possible (there is no syntax way to do
it) to ALTER an existing publication so that it will publish
SEQUENCES.

Isn't that a limitation? Why?

For example,. Why should users be prevented from changing a FOR ALL
TABLES publication into a FOR ALL TABLES, SEQUENCES one?

Similarly, there are other combinations not possible
DROP ALL SEQUENCES from a publication that is FOR ALL TABLES, SEQUENCES
DROP ALL TABLES from a publication that is FOR ALL TABLES, SEQUENCES
ADD ALL TABLES to a publication that is FOR ALL SEQUENCES
...

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-08-11 Thread Peter Smith
Hi Vignesh,

v20240809-0001. No comments.
v20240809-0002. See below.
v20240809-0003. See below.
v20240809-0004. No comments.

//

Here are my review comments for patch v20240809-0002.

nit - Tweak wording in new docs example, because a publication only
publishes the sequences; it doesn't "synchronize" anything.

//

Here are my review comments for patch v20240809-0003.

fetch_sequence_list:
nit - move comment
nit - minor rewording for parameter WARNING message

==
.../replication/logical/sequencesync.c
src/backend/replication/logical/tablesync.c

1.
Currently the declaration 'sequence_states_not_ready' list seems
backwards. IMO it makes more sense for the declaration to be in
sequencesync.c, and the extern in the tablesync.c. (please also see
review comment #3 below which might affect this too).

~~~

2.
 static bool
-FetchTableStates(bool *started_tx)
+FetchTableStates(void)
 {
- static bool has_subrels = false;
-
- *started_tx = false;
+ static bool has_subtables = false;
+ bool started_tx = false;

Maybe give the explanation why 'has_subtables' is declared static here.

~~~

3.
I am not sure that it was an improvement to move the
process_syncing_sequences_for_apply() function into the
sequencesync.c. Calling the sequence code from the tablesync code
still looks strange. OTOH, I see why you don't want to leave it in
tablesync.c.

Perhaps it would be better to refactor/move all following functions
back to the (apply) worker.c instead:
- process_syncing_relations
- process_syncing_sequences_for_apply(void)
- process_syncing_tables_for_apply(void)

Actually, now that there are 2 kinds of 'sync' workers, maybe you
should introduce a new module (e.g. 'commonsync.c' or
'syncworker.c...), where you can put functions such as
process_syncing_relations() plus any other code common to both
tablesync and sequencesync. That might make more sense then having one
call to the other.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ref/create_publication.sgml 
b/doc/src/sgml/ref/create_publication.sgml
index 92758c7..64214ba 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -427,8 +427,8 @@ CREATE PUBLICATION all_sequences FOR ALL SEQUENCES;
   
 
   
-   Create a publication that publishes all changes in all tables and
-   synchronizes all sequences:
+   Create a publication that publishes all changes in all tables, and
+   all sequences for synchronization:
 
 CREATE PUBLICATION all_tables_sequences FOR ALL TABLES, SEQUENCES;
 
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 2ac63c7..c595873 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -2546,6 +2546,7 @@ fetch_sequence_list(WalReceiverConn *wrconn, List 
*publications)
 
seqform = (Form_pg_sequence) GETSTRUCT(tup);
 
+   /* Build a list of sequences that don't match in publisher and 
subscriber */
if (seqform->seqtypid != seqtypid || seqform->seqmin != seqmin 
||
seqform->seqmax != seqmax || seqform->seqstart != 
seqstart ||
seqform->seqincrement != seqincrement ||
@@ -2556,7 +2557,6 @@ fetch_sequence_list(WalReceiverConn *wrconn, List 
*publications)
else
appendStringInfoString(warning_sequences, ", ");
 
-   /* Add the sequences that don't match in publisher and 
subscriber */
appendStringInfo(warning_sequences, "\"%s.%s\"",
 
get_namespace_name(get_rel_namespace(relid)),
 get_rel_name(relid));
@@ -2575,7 +2575,7 @@ fetch_sequence_list(WalReceiverConn *wrconn, List 
*publications)
 */
ereport(WARNING,

errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-   errmsg("Sequence parameter in remote and local 
is not same for %s",
+   errmsg("Parameters differ for remote and local 
sequences %s",
warning_sequences->data),
errhint("Alter/Re-create the sequence using the 
same parameter as in remote."));
pfree(warning_sequences);
diff --git a/src/test/subscription/t/034_sequences.pl 
b/src/test/subscription/t/034_sequences.pl
index 7cc8c8c..52453cc 100644
--- a/src/test/subscription/t/034_sequences.pl
+++ b/src/test/subscription/t/034_sequences.pl
@@ -177,7 +177,7 @@ $node_subscriber->safe_psql(
 ALTER SUBSCRIPTION regress_seq_sub REFRESH PUBLICATION SEQUENCES");
 like(
$stder

Re: Logical Replication of sequences

2024-08-09 Thread Peter Smith
Hi Vignesh, here are my review comments for the sequences docs patch
v20240808-0004.

==
doc/src/sgml/logical-replication.sgml

The new section content looked good.

Just some nitpicks including:
- renamed the section "Replicating Sequences"
- added missing mention about how to publish sequences
- rearranged the subscription commands into a more readable list
- some sect2 titles were very long; I shortened them.
- added  markup for the sequence definition advice
- other minor rewording and typo fixes

~

1.
IMO the "Caveats" section can be removed.
- the advice to avoid changing the sequence definition is already
given earlier in the "Sequence Definition Mismatches" section
- the limitation of "incremental synchronization" is already stated in
the logical replication "Limitations" section
- (FYI, I removed it already in my nitpicks attachment)

==
doc/src/sgml/ref/alter_subscription.sgml

nitpick - I reversed the paragraphs to keep the references in a natural order.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

On Fri, Aug 9, 2024 at 1:52 AM vignesh C  wrote:
>
> On Thu, 8 Aug 2024 at 08:30, Peter Smith  wrote:
> >
> > Hi Vignesh, Here are my v20240807-0003 review comments.
> >
> > 2a.
> > The paragraph starts by saying "Sequence data is not replicated.". It
> > seems wrong now. Doesn't that need rewording or removing?
>
> Changed it to incremental sequence changes.
>
> > ~
> >
> > 2b.
> > Should the info "If, however, some kind of switchover or failover..."
> > be mentioned in the "Logical Replication Failover" section [2],
> > instead of here?
>
> I think mentioning this here is appropriate. The other section focuses
> more on how logical replication can proceed with a new primary. Once
> the logical replication setup is complete, sequences can be refreshed
> at any time.
>
> Rest of the comments  are fixed,  the attached v20240808 version patch
> has the changes for the same.
>
> Regards,
> Vignesh
diff --git a/doc/src/sgml/logical-replication.sgml 
b/doc/src/sgml/logical-replication.sgml
index d0fdcde..bc2aacc 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1571,49 +1571,88 @@ test_sub=# SELECT * FROM t1 ORDER BY id;
  
 
  
-  Sequences
+  Replicating Sequences
 
   
-   Sequences can be synchronized between a publisher and a subscriber using
-   CREATE 
SUBSCRIPTION
-   to initially synchronize sequences, 
-   ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
-   synchronize any newly added sequences and 
-   ALTER SUBSCRIPTION ... REFRESH PUBLICATION 
SEQUENCES
-   to re-synchronize all sequences. A new sequence synchronization worker will
-   be started to synchronize the sequences after executing the above commands
-   and will exit once the sequences are synchronized.
+   To replicate sequences from a publisher to a subscriber, first publish the
+   sequence using 
+   CREATE PUBLICATION ... FOR ALL SEQUENCES.
   
 
   
-   Sequence synchronization worker will be used from
-   
+   At the subscriber side:
+   
+
+ 
+  use CREATE 
SUBSCRIPTION
+  to initially synchronize the published sequences.
+ 
+
+
+ 
+  use 
+  ALTER SUBSCRIPTION ... REFRESH PUBLICATION
+  to synchronize any newly added sequences.
+ 
+
+
+ 
+  use 
+  ALTER SUBSCRIPTION ... REFRESH PUBLICATION 
SEQUENCES
+  to re-synchronize all sequences.
+ 
+
+   
+  
+
+  
+   A new sequence synchronization worker will be started to synchronize the
+   sequences after executing any of the above subscriber commands, and will
+   will exit once the sequences are synchronized.
+  
+  
+   The ability to launch a sequence synchronization worker will be limited by
+   the 
max_sync_workers_per_subscription
configuration.
   
 
-  
-   Differences in Sequence Definitions Between Publisher and 
Subscriber
+  
+   Sequence Definition Mismatches
+   
+
+ If there are differences in sequence definitions between the publisher and
+ subscriber, a WARNING is logged.
+
+   

-If there are differences in sequence definitions between the publisher and
-subscriber, a WARNING is logged. To resolve this, use
+To resolve this, use
 ALTER SEQUENCE
 to align the subscriber's sequence parameters with those of the publisher.
 Subsequently, execute 
 ALTER SUBSCRIPTION ... REFRESH PUBLICATION 
SEQUENCES.
-It is advisable not to change sequence definitions on either the publisher
-or the subscriber until synchronization is complete and the
+   
+   
+Changes to sequence definitions during the execution of
+
+ALTER SUBSCRIPTION ... REFRESH PUBLICATION 
SEQUENCES
+may not be detected, potentially leading to inconsistent values

Re: Logical Replication of sequences

2024-08-08 Thread Peter Smith
Hi Vignesh, I reviewed the latest v20240808-0003 patch.

Attached are my minor change suggestions.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/catalog/pg_subscription.c 
b/src/backend/catalog/pg_subscription.c
index 4e2f960..a77e810 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -517,9 +517,9 @@ HasSubscriptionTables(Oid subid)
  *
  * all_states:
  * If getting tables, if all_states is true get all tables, otherwise
- * only get tables that have not reached 'READY' state.
+ * only get tables that have not reached READY state.
  * If getting sequences, if all_states is true get all sequences,
- * otherwise only get sequences that are in 'init' state.
+ * otherwise only get sequences that are in INIT state.
  *
  * The returned list is palloc'ed in the current memory context.
  */
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index bb6aa8e..2833379 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -868,10 +868,12 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
  * Update the subscription to refresh both the publication and the publication
  * objects associated with the subscription.
  *
- * If 'copy_data' parameter is true, the function will set the state
- * to "init"; otherwise, it will set the state to "ready".
+ * Parameters:
  *
- * When 'validate_publications' is provided with a publication list, the
+ * If 'copy_data' is true, the function will set the state to INIT; otherwise,
+ * it will set the state to READY.
+ *
+ * If 'validate_publications' is provided with a publication list, the
  * function checks that the specified publications exist on the publisher.
  *
  * If 'refresh_tables' is true, update the subscription by adding or removing
@@ -882,9 +884,22 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
  * sequences that have been added or removed since the last subscription
  * creation or publication refresh.
  *
- * If 'resync_all_sequences' is true, mark all objects with "init" state
- * for re-synchronization; otherwise, only update the newly added tables and
- * sequences based on the copy_data parameter.
+ * Note, this is a common function for handling different REFRESH commands
+ * according to the parameter 'resync_all_sequences'
+ *
+ * 1. ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES
+ *(when parameter resync_all_sequences is true)
+ *
+ *The function will mark all sequences with INIT state.
+ *Assert copy_data is true.
+ *Assert refresh_tables is false.
+ *Assert refresh_sequences is true.
+ *
+ * 2. ALTER SUBSCRIPTION ... REFRESH PUBLICATION [WITH (copy_data=true|false)]
+ *(when parameter resync_all_sequences is false)
+ *
+ *The function will update only the newly added tables and/or sequences
+ *based on the copy_data parameter.
  */
 static void
 AlterSubscription_refresh(Subscription *sub, bool copy_data,
@@ -910,11 +925,10 @@ AlterSubscription_refresh(Subscription *sub, bool 
copy_data,
WalReceiverConn *wrconn;
boolmust_use_password;
 
-   /* resync_all_sequences cannot be specified with refresh_tables */
-   Assert(!(resync_all_sequences && refresh_tables));
-
-   /* resync_all_sequences cannot be specified with copy_data as false */
-   Assert(!(resync_all_sequences && !copy_data));
+#ifdef USE_ASSERT_CHECKING
+   if (resync_all_sequences)
+   Assert(copy_data && !refresh_tables && refresh_sequences);
+#endif
 
/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);


Re: Pgoutput not capturing the generated columns

2024-08-08 Thread Peter Smith
Hi Shubham,

I think the v25-0001 patch only half-fixes the problems reported in my
v24-0001 review.

~

Background (from the commit message):
This commit enables support for the 'include_generated_columns' option
in logical replication, allowing the transmission of generated column
information and data alongside regular table changes.

~

The broken TAP test scenario in question is replicating from a
"not-generated" column to a "generated" column. As the generated
column is not on the publishing side, IMO the
'include_generated_columns' option should have zero effect here.

In other words, I expect this TAP test for 'include_generated_columns
= true' case should also be failing, as I wrote already yesterday:

+# FIXME
+# Since there is no generated column on the publishing side this should give
+# the same result as the previous test. -- e.g. something like:
+# ERROR:  logical replication target relation
"public.tab_nogen_to_gen" is missing
+# replicated column: "b"

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-08-07 Thread Peter Smith
On Thu, Aug 8, 2024 at 1:55 PM Amit Kapila  wrote:
>
> On Wed, Aug 7, 2024 at 10:12 AM Peter Smith  wrote:
> >
> > This is mostly a repeat of my previous mail from a while ago [1] but
> > includes some corrections, answers, and more examples. I'm going to
> > try to persuade one last time because the current patch is becoming
> > stable, so I wanted to revisit this syntax proposal before it gets too
> > late to change anything.
> >
> > If there is some problem with the proposed idea please let me know
> > because I can see only the advantages and no disadvantages of doing it
> > this way.
> >
> > ~~~
> >
> > The current patchset offers two forms of subscription refresh:
> > 1. ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( refresh_option
> > [= value] [, ... ] ) ]
> > 2. ALTER SUBSCRIPTION name REFRESH PUBLICATION SEQUENCES
> >
> > Since 'copy_data' is the only supported refresh_option, really it is more 
> > like:
> > 1. ALTER SUBSCRIPTION name REFRESH PUBLICATION [ WITH ( copy_data [=
> > true|false] ) ]
> > 2. ALTER SUBSCRIPTION name REFRESH PUBLICATION SEQUENCES
> >
> > ~~~
> >
> > I proposed previously that instead of having 2 commands for refreshing
> > subscriptions we should have a single refresh command:
> >
> > ALTER SUBSCRIPTION name REFRESH PUBLICATION [TABLES|SEQUENCES] [ WITH
> > ( copy_data [= true|false] ) ]
> >
> > Why?
> >
> > - IMO it is less confusing than having 2 commands that both refresh
> > sequences in slightly different ways
> >
> > - It is more flexible because apart from refreshing everything, a user
> > can choose to refresh only tables or only sequences if desired; IMO
> > more flexibility is always good.
> >
> > - There is no loss of functionality from the current implementation
> > AFAICT. You can still say "ALTER SUBSCRIPTION sub REFRESH PUBLICATION
> > SEQUENCES" exactly the same as the patchset allows.
> >
> > ~~~
> >
> > So, to continue this proposal, let the meaning of 'copy_data' for
> > SEQUENCES be as follows:
> >
> > - when copy_data == false: it means don't copy data (i.e. don't
> > synchronize anything). Add/remove sequences from pg_subscriber_rel as
> > needed.
> >
> > - when copy_data == true: it means to copy data (i.e. synchronize) for
> > all sequences. Add/remove sequences from pg_subscriber_rel as needed)
> >
>
> I find overloading the copy_data option more confusing than adding a
> new variant for REFRESH. To make it clear, we can even think of
> extending the command as ALTER SUBSCRIPTION name REFRESH PUBLICATION
> ALL SEQUENCES or something like that.  I don't know where there is a
> need or not but one can imagine extending it as ALTER SUBSCRIPTION
> name REFRESH PUBLICATION SEQUENCES [, , ..].
> This will allow to selectively refresh the sequences.
>

But, I haven't invented a new overloading for "copy_data" option
(meaning "synchronize") for sequences. The current patchset already
interprets copy_data exactly this way.

For example, below are patch 0003 results:

ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION WITH (copy_data=false)
- this will add/remove new sequences in pg_subscription_rel, but it
will *not* synchronize the new sequence

ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION WITH (copy_data=true)
- this will add/remove new sequences in pg_subscription_rel, and it
*will* synchronize the new sequence

~

I only proposed that copy_data should apply to *all* sequences, not
just new ones.

==
Kind Regards.
Peter Smith.
Fujitsu Australia.




Re: Logical Replication of sequences

2024-08-07 Thread Peter Smith
Hi Vignesh, Here are my v20240807-0003 review comments.

==
1. GENERAL DOCS.

IMO the replication of SEQUENCES is a big enough topic that it
deserves to have its own section in the docs chapter 31 [1].

Some of the create/alter subscription docs content would stay where it
is in, but a new chapter would just tie everything together better. It
could also serve as a better place to describe the other sequence
replication content like:
(a) getting a WARNING for mismatched sequences and how to handle it.
(b) how can the user know when a subscription refresh is required to
(re-)synchronise sequences
(c) pub/sub examples

==
doc/src/sgml/logical-replication.sgml

2. Restrictions

Sequence data is not replicated. The data in serial or identity
columns backed by sequences will of course be replicated as part of
the table, but the sequence itself would still show the start value on
the subscriber. If the subscriber is used as a read-only database,
then this should typically not be a problem. If, however, some kind of
switchover or failover to the subscriber database is intended, then
the sequences would need to be updated to the latest values, either by
executing ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES or by
copying the current data from the publisher (perhaps using pg_dump) or
by determining a sufficiently high value from the tables themselves.

~

2a.
The paragraph starts by saying "Sequence data is not replicated.". It
seems wrong now. Doesn't that need rewording or removing?

~

2b.
Should the info "If, however, some kind of switchover or failover..."
be mentioned in the "Logical Replication Failover" section [2],
instead of here?

==
doc/src/sgml/ref/alter_subscription.sgml

3.
Sequence values may occasionally become out of sync due to updates in
the publisher. To verify this, compare the
pg_subscription_rel.srsublsn on the subscriber with the page_lsn
obtained from the pg_sequence_state for the sequence on the publisher.
If the sequence is still using prefetched values, the page_lsn will
not be updated. In such cases, you will need to directly compare the
sequences and execute REFRESH PUBLICATION SEQUENCES if required.

~

3a.
This whole paragraph may be better put in the new chapter that was
suggested earlier in review comment #1.

~

3b.
Is it only "Occasionally"? I expected subscriber-side sequences could
become stale quite often.

~

3c.
Is this advice very useful? It's saying if the LSN is different then
the sequence is out of date, but if the LSN is not different then you
cannot tell. Why not ignore LSN altogether and just advise the user to
directly compare the sequences in the first place?

==

Also, there are more minor suggestions in the attached nitpicks diff.

==
[1] https://www.postgresql.org/docs/current/logical-replication.html
[2] 
file:///usr/local/pg_oss/share/doc/postgresql/html/logical-replication-failover.html

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 6b320b1..bb6aa8e 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -726,7 +726,7 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
recordDependencyOnOwner(SubscriptionRelationId, subid, owner);
 
/*
-* XXX todo: If the subscription is for a sequence-only publication,
+* XXX: If the subscription is for a sequence-only publication,
 * creating this origin is unnecessary at this point. It can be created
 * later during the ALTER SUBSCRIPTION ... REFRESH command, if the
 * publication is updated to include tables or tables in schemas.
@@ -756,7 +756,7 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
 
PG_TRY();
{
-   bool   hastables = false;
+   boolhas_tables;
List   *relations;
chartable_state;
 
@@ -771,16 +771,14 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
table_state = opts.copy_data ? SUBREL_STATE_INIT : 
SUBREL_STATE_READY;
 
/*
-* Get the table list from publisher and build local 
table status
-* info.
+* Build local relation status info. Relations are for 
both tables and
+* sequences from the publisher.
 */
relations = fetch_table_list(wrconn, publications);
-   if (relations != NIL)
-   hastables = true;
-
-   /* Include the sequence list from publisher. */
+   has_tables = relations != NIL;
   

Re: Pgoutput not capturing the generated columns

2024-08-07 Thread Peter Smith
Hi Shubham,

Here are my review comments for patch v24-0001

I think the TAP tests have incorrect expected results for the nogen-to-gen case.

Whereas the HEAD code will cause "ERROR" for this test scenario, patch
0001 does not. IMO the behaviour should be unchanged for this scenario
which has no generated column on the publisher side. So it seems this
is a bug in patch 0001.

FYI, I have included "FIXME" comments in the attached top-up diff
patch to show which test cases I think are expecting wrong results.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/subscription/t/011_generated.pl 
b/src/test/subscription/t/011_generated.pl
index 13499a1..a9430f7 100644
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl
@@ -478,10 +478,11 @@ $node_publisher->safe_psql('postgres',
 
 # regress_sub1_combo_nogen_to_gen: (include_generated_columns = false)
 #
-# XXX
-# The test below shows that current PG17 behavior does not give an error,
-# But this conflicts with the copy_data=true behavior so it might be a PG17 
bug.
-# Needs more study.
+# FIXME
+# I think the following expected result is wrong. IIUC it should give
+# the same error as HEAD -- e.g. something like:
+# ERROR:  logical replication target relation "public.tab_nogen_to_gen" is 
missing
+# replicated column: "b"
 $node_publisher->wait_for_catchup('regress_sub1_combo_nogen_to_gen_nocopy');
 $result =
   $node_subscriber->safe_psql('postgres',
@@ -495,9 +496,11 @@ is( $result, qq(4|88
 # When copy_data=false, no COPY error occurs.
 # The col 'b' is not replicated; the subscriber-side generated value is 
inserted.
 #
-# XXX
-# It is correct for this to give the same result as above, but it needs more
-# study to determine if the above result was actually correct, or a PG17 bug.
+# FIXME
+# Since there is no generated column on the publishing side this should give
+# the same result as the previous test. -- e.g. something like:
+# ERROR:  logical replication target relation "public.tab_nogen_to_gen" is 
missing
+# replicated column: "b"
 $node_publisher->wait_for_catchup('regress_sub2_combo_nogen_to_gen');
 $result =
   $node_subscriber2->safe_psql('postgres',


Re: Logical Replication of sequences

2024-08-06 Thread Peter Smith
t copy_data)
ALTER SUBSCRIPTION sub REFRESH PUBLICATION
- same as ex8

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuFH1OCj-P1UKoRQE2X4-0zMG%2BN1V7jdn%3DtOQV4RNbAbw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-08-06 Thread Peter Smith
ENCES_SYNC_PER_BATCH several times so I
changed the wording of one of them

~~~

fetch_remote_sequence_data:
nitpick - all other params have the same name as sequence members, so
change the parameter name /lsn/page_lsn/

~

copy_sequence:
nitpick - rename var /seq_lsn/seq_page_lsn/

==
src/backend/replication/logical/tablesync.c

6. process_syncing_sequences_for_apply

+ * If a sequencesync worker is running already, there is no need to start a new
+ * one; the existing sequencesync worker will synchronize all the sequences. If
+ * there are still any sequences to be synced after the sequencesync worker
+ * exited, then a new sequencesync worker can be started in the next iteration.
+ * To prevent starting the sequencesync worker at a high frequency after a
+ * failure, we store its last failure time. We start the sync worker for the
+ * same relation after waiting at least wal_retrieve_retry_interval.

Why is it talking about "We start the sync worker for the same
relation ...". The sequencesync_failuretime is per sync worker, not
per relation. And, I don't see any 'same relation' check in the code.

==
src/include/catalog/pg_subscription_rel.h

GetSubscriptionRelations:
nitpick - changed parameter name /all_relations/all_states/

==
src/test/subscription/t/034_sequences.pl

nitpick - add some ## comments to highlight the main test
parts to make it easier to read.
nitpick - fix typo /syned/synced/

7. More test cases?
IIUC you can also get a sequence mismatch warning during "ALTER ...
REFRESH PUBLICATION", and "CREATE SUBSCRIPTION". So, should those be
tested also?

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 16c427e..5c66797 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8109,7 +8109,7 @@ SCRAM-SHA-256$<iteration 
count>:&l
 
   
This catalog only contains tables and sequences known to the subscription
-   after running either
+   after running
CREATE 
SUBSCRIPTION or
   
ALTER SUBSCRIPTION ... REFRESH PUBLICATION or
diff --git a/doc/src/sgml/ref/alter_subscription.sgml 
b/doc/src/sgml/ref/alter_subscription.sgml
index eb7d544..f280019 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -200,6 +200,12 @@ ALTER SUBSCRIPTION name RENAME TO <
   ALTER SUBSCRIPTION ... REFRESH PUBLICATION 
SEQUENCES
  
  
+  See  for 
recommendations on how
+  to handle any warnings about differences in the sequence definition
+  between the publisher and the subscriber, which might occur when
+  copy_data = true.
+ 
+ 
   See  for details of
   how copy_data = true can interact with the
   origin
@@ -211,11 +217,6 @@ ALTER SUBSCRIPTION name RENAME TO <
   parameter of CREATE SUBSCRIPTION for details about
   copying pre-existing data in binary format.
  
- 
-  See the copy_data
-  on how to handle the warnings regarding the difference in sequence
-  definition between the publisher and the subscriber.
- 
 

   
@@ -230,12 +231,12 @@ ALTER SUBSCRIPTION name RENAME TO <
   sequence data with the publisher. Unlike 
   ALTER SUBSCRIPTION ... REFRESH PUBLICATION 
which
   only synchronizes newly added sequences, REFRESH PUBLICATION 
SEQUENCES
-  will re-synchronize the sequence data for all subscribed sequences. The
-  sequence definition can differ between the publisher and the subscriber,
-  this is detected and a WARNING is logged to the user, but the warning is
-  only an indication of a potential problem; it is recommended to alter the
-  sequence to keep the sequence option same as the publisher and execute
-  the command again.
+  will re-synchronize the sequence data for all subscribed sequences.
+ 
+ 
+  See  for recommendations 
on how
+  to handle any warnings about differences in the sequence definition
+  between the publisher and the subscriber.
  
 

diff --git a/doc/src/sgml/ref/create_subscription.sgml 
b/doc/src/sgml/ref/create_subscription.sgml
index de3bdb8..e28ed96 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -264,12 +264,10 @@ CREATE SUBSCRIPTION subscription_nameorigin parameter.
  
  
-  The sequence definition can differ between the publisher and the
-  subscriber, this is detected and a WARNING is logged to the user, but
-  the warning is only an indication of a potential problem; it is
-  recommended to alter the sequence to keep the sequence option same as
-  the publisher and execute 
-  ALTER SUBSCRIPTION ... REFRESH PUBLICATION 
SEQUENCES.
+  See  for 
recommendations

Re: Pgoutput not capturing the generated columns

2024-08-04 Thread Peter Smith
(PID 11095) exited with exit code 1
2024-08-05 13:25:24.917 AEST [11225] LOG:  logical replication apply
worker for subscription "sub1_nocopy" has started
2024-08-05 13:25:24.926 AEST [11225] ERROR:  logical replication
target relation "public.tab_nogen_to_gen" is missing replicated
column: "b"
2024-08-05 13:25:24.926 AEST [11225] CONTEXT:  processing remote data
for replication origin "pg_16390" during message type "INSERT" in
transaction 742, finished at 0/1967BB0
2024-08-05 13:25:24.927 AEST [20039] LOG:  background worker "logical
replication apply worker" (PID 11225) exited with exit code 1
...

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPvtT8fKOfvVYr4vANx_fr92vedas%2BZRbQxvMC097rks6w%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-08-04 Thread Peter Smith
Hi Shubhab.

Here are some more review comments for the v23-0001.

==
011_generated.pl b/src/test/subscription/t/011_generated.pl

nitpick - renamed /regress_pub/regress_pub_tab1/ and
/regress_sub1/regress_sub1_tab1/
nitpick - typo /inital data /initial data/
nitpick - typo /snode_subscriber2/node_subscriber2/
nitpick - tweak the combo initial sync comments and messages
nitpick - /#Cleanup/# cleanup/
nitpick - tweak all the combo normal replication comments
nitpick - removed blank line at the end

~~~

1. Refactor tab_gen_to_missing initial sync tests.

I moved the tab_gen_to_missing initial sync for node_subscriber2 to be
back where all the other initial sync tests are done.
See the nitpicks patch file.

~~~

2. Refactor tab_nogen_to_gen initial sync tests

I moved all the tab_nogen_to_gen initial sync tests back to where the
other initial sync tests are done.
See the nitpicks patch file.

~~~

3. Added another test case:

Because the (current PG17) nogen-to-gen initial sync test case (with
copy_data=true) gives an ERROR, I have added another combination to
cover normal replication (e.g. using copy_data=false).
See the nitpicks patch file.

(This has exposed an inconsistency which IMO might be a PG17 bug. I
have included TAP test comments about this, and plan to post a
separate thread for it later).

~

4. GUC

Moving and adding more CREATE SUBSCRIPTION exceeded some default GUCs,
so extra configuration was needed.
See the nitpick patch file.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/subscription/t/011_generated.pl 
b/src/test/subscription/t/011_generated.pl
index 0b596b7..2be06c6 100644
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl
@@ -12,16 +12,25 @@ use Test::More;
 
 my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
 $node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf',
+   "max_wal_senders = 20
+max_replication_slots = 20");
 $node_publisher->start;
 
 # All subscribers on this node will use parameter include_generated_columns = 
false
 my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 $node_subscriber->init;
+$node_subscriber->append_conf('postgresql.conf',
+   "max_logical_replication_workers = 20
+max_worker_processes = 20");
 $node_subscriber->start;
 
 # All subscribers on this node will use parameter include_generated_columns = 
true
 my $node_subscriber2 = PostgreSQL::Test::Cluster->new('subscriber2');
 $node_subscriber2->init;
+$node_subscriber2->append_conf('postgresql.conf',
+   "max_logical_replication_workers = 20
+max_worker_processes = 20");
 $node_subscriber2->start;
 
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
@@ -139,7 +148,7 @@ $node_publisher->safe_psql('postgres',
 # pub_combo_gen_to_missing is not included in pub_combo, because some tests 
give errors.
 
 $node_publisher->safe_psql('postgres',
-   "CREATE PUBLICATION regress_pub FOR TABLE tab1");
+   "CREATE PUBLICATION regress_pub_tab1 FOR TABLE tab1");
 $node_publisher->safe_psql('postgres',
"CREATE PUBLICATION regress_pub_combo FOR TABLE tab_gen_to_gen, 
tab_gen_to_nogen, tab_missing_to_gen"
 );
@@ -157,10 +166,10 @@ $node_publisher->safe_psql('postgres',
 #
 # Note that all subscriptions created on node_subscriber2 use copy_data = 
false,
 # because copy_data = true with include_generated_columns is not yet supported.
-# For this reason, the expected inital data on snode_subscriber2 is always 
empty.
+# For this reason, the expected inital data on node_subscriber2 is always 
empty.
 
 $node_subscriber->safe_psql('postgres',
-   "CREATE SUBSCRIPTION regress_sub1 CONNECTION '$publisher_connstr' 
PUBLICATION regress_pub"
+   "CREATE SUBSCRIPTION regress_sub1_tab1 CONNECTION '$publisher_connstr' 
PUBLICATION regress_pub_tab1"
 );
 $node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION regress_sub1_combo CONNECTION '$publisher_connstr' 
PUBLICATION regress_pub_combo"
@@ -168,11 +177,18 @@ $node_subscriber->safe_psql('postgres',
 $node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION regress_sub1_combo_gen_to_missing CONNECTION 
'$publisher_connstr' PUBLICATION regress_pub_combo_gen_to_missing"
 );
+# Note, regress_sub1_combo_nogen_to_gen is not created here due to expected 
errors. See later.
 
 $node_subscriber2->safe_psql('postgres',
"CREATE SUBSCRIPTION regress_sub2_combo CONNECTION '$publisher_connstr' 
PUBLICATION regress

Re: Pgoutput not capturing the generated columns

2024-08-01 Thread Peter Smith
Hi, Here are my review comments for patch v22-0001

All comments now are only for the TAP test.

==
src/test/subscription/t/011_generated.pl

1. I added all new code for the missing combination test case
"gen-to-missing". See nitpicks diff.
- create a separate publication for this "tab_gen_to_missing" table
because the test gives subscription errors.
- for the initial data
- for the replicated data

~~~

2. I added sub1 and sub2 subscriptions for every combo test
(previously some were absent). See nitpicks diff.

~~~

3. There was a missing test case for nogen-to-gen combination, and
after experimenting with this I am getting a bit suspicious,

Currently, it seems that if a COPY is attempted then the error would
be like this:
2024-08-01 17:16:45.110 AEST [18942] ERROR:  column "b" is a generated column
2024-08-01 17:16:45.110 AEST [18942] DETAIL:  Generated columns cannot
be used in COPY.

OTOH, if a COPY is not attempted (e.g. copy_data = false) then patch
0001 allows replication to happen. And the generated value of the
subscriber "b" takes precedence.

I have included these tests in the nitpicks diff of patch 0001.

Those results weren't exactly what I was expecting.  That is why it is
so important to include *every* test combination in these TAP tests --
because unless we know how it works today, we won't know if we are
accidentally breaking the current behaviour with the other (0002,
0003) patches.

Please experiment in patches 0001 and 0002 using tab_nogen_to_gen more
to make sure the (new?) patch errors make sense and don't overstep by
giving ERRORs when they should not.



Also, many other smaller issues/changes were done:

~~~

Creating tables:

nitpick - rearranged to keep all combo test SQLs in a consistent order
throughout this file
1/ gen-to-gen
2/ gen-to-nogen
3/ gen-to-missing
4/ missing-to-gen
5/ nogen-to-gen

nitpick - fixed the wrong comment for CREATE TABLE tab_nogen_to_gen.

nitpick - tweaked some CREATE TABLE comments for consistency.

nitpick - in the v22 patch many of the generated col 'b' use different
computations for every test. It makes it unnecessarily difficult to
read/review the expected results. So, I've made them all the same. Now
computation is "a * 2" on the publisher side, and "a * 22" on the
subscriber side.

~~~

Creating Publications and Subscriptions:


nitpick - added comment for all the CREATE PUBLICATION

nitpick - added comment for all the CREATE SUBSCRIPTION

nitpick - I moved the note about copy_data = false to where all the
node_subscriber2 subscriptions are created. Also, don't explicitly
refer to "patch 000" in the comment, because that will not make any
sense after getting pushed.

nitpick - I changed many subscriber names to consistently use "sub1"
or "sub2" within the name (this is the visual cue of which
node_subscriber they are on). e.g.
/regress_sub_combo2/regress_sub2_combo/

~~~

Initial Sync tests:

nitpick - not sure if it is possible to do the initial data tests for
"nogen_to_gen" in the normal place. For now, it is just replaced by a
comment.
NOTE - Maybe this should be refactored later to put all the initial
data checks in one place. I'll think about this point more in the next
review.

~~~

nitpick - Changed cleanup I drop subscriptions before publications.

nitpick - remove the unnecessary blank line at the end.

==

Please see the attached diffs patch (apply it atop patch 0001) which
includes all the nipick changes mentioned above.

~~

BTW, For a quicker turnaround and less churning please consider just
posting the v23-0001 by itself instead of waiting to rebase all the
subsequent patches. When 0001 settles down some more then rebase the
others.

~~

Also, please run the indentation tool over this code ASAP.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/subscription/t/011_generated.pl 
b/src/test/subscription/t/011_generated.pl
index 05b83f6..504714a 100644
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl
@@ -34,59 +34,60 @@ $node_subscriber->safe_psql('postgres',
"CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 
22) STORED, c int)"
 );
 
+# tab_gen_to_gen:
 # publisher-side has generated col 'b'.
 # subscriber-side has generated col 'b', with different computation.
 $node_publisher->safe_psql('postgres',
-   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a + 10) 
STORED)");
+   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a * 2) 
STORED)");
 $node_subscriber->safe_psql('postgres',
-   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS AS (a + 20) 
STORED)");
+   "CREATE TABLE tab_gen_to_gen (a int, b int GENERATED ALWAYS A

Re: Logical Replication of sequences

2024-07-31 Thread Peter Smith
Hi Vignesh,

I noticed that when replicating sequences (using the latest patches
0730_2*)  the subscriber-side checks the *existence* of the sequence,
but apparently it is not checking other sequence attributes.

For example, consider:

Publisher: "CREATE SEQUENCE s1 START 1 INCREMENT 2;" should be a
sequence of only odd numbers.
Subscriber: "CREATE SEQUENCE s1 START 2 INCREMENT 2;" should be a
sequence of only even numbers.

Because the names match, currently the patch allows replication of the
s1 sequence. I think that might lead to unexpected results on the
subscriber. IMO it might be safer to report ERROR unless the sequences
match properly (i.e. not just a name check).

Below is a demonstration the problem:

==
Publisher:
==

(publisher sequence is odd numbers)

test_pub=# create sequence s1 start 1 increment 2;
CREATE SEQUENCE
test_pub=# select * from nextval('s1');
 nextval
-
   1
(1 row)

test_pub=# select * from nextval('s1');
 nextval
-
   3
(1 row)

test_pub=# select * from nextval('s1');
 nextval
-
   5
(1 row)

test_pub=# CREATE PUBLICATION pub1 FOR ALL SEQUENCES;
CREATE PUBLICATION
test_pub=#

==
Subscriber:
==

(subscriber sequence is even numbers)

test_sub=# create sequence s1 start 2 increment 2;
CREATE SEQUENCE
test_sub=# SELECT * FROM nextval('s1');
 nextval
-
   2
(1 row)

test_sub=# SELECT * FROM nextval('s1');
 nextval
-
   4
(1 row)

test_sub=# SELECT * FROM nextval('s1');
 nextval
-
   6
(1 row)

test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub'
PUBLICATION pub1;
2024-08-01 08:43:04.198 AEST [24325] WARNING:  subscriptions created
by regression test cases should have names starting with "regress_"
WARNING:  subscriptions created by regression test cases should have
names starting with "regress_"
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2024-08-01 08:43:04.294 AEST [26240] LOG:  logical
replication apply worker for subscription "sub1" has started
2024-08-01 08:43:04.309 AEST [26244] LOG:  logical replication
sequence synchronization worker for subscription "sub1" has started
2024-08-01 08:43:04.323 AEST [26244] LOG:  logical replication
synchronization for subscription "sub1", sequence "s1" has finished
2024-08-01 08:43:04.323 AEST [26244] LOG:  logical replication
sequence synchronization worker for subscription "sub1" has finished

(after the CREATE SUBSCRIPTION we are getting replicated odd values
from the publisher, even though the subscriber side sequence was
supposed to be even numbers)

test_sub=# SELECT * FROM nextval('s1');
 nextval
-
   7
(1 row)

test_sub=# SELECT * FROM nextval('s1');
 nextval
-
   9
(1 row)

test_sub=# SELECT * FROM nextval('s1');
 nextval
-
  11
(1 row)

(Looking at the description you would expect odd values for this
sequence to be impossible)

test_sub=# \dS+ s1
 Sequence "public.s1"
  Type  | Start | Minimum |   Maximum   | Increment | Cycles? | Cache
+---+-+-+---+-+---
 bigint | 2 |   1 | 9223372036854775807 | 2 | no  | 1

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-07-31 Thread Peter Smith
Hi Vignesh,

I have a question about the subscriber-side behaviour of currval().

==

AFAIK it is normal for currval() to give error is nextval() has not
yet been called [1]

For example.
test_pub=# create sequence s1;
CREATE SEQUENCE
test_pub=# select * from currval('s1');
2024-08-01 07:42:48.619 AEST [24131] ERROR:  currval of sequence "s1"
is not yet defined in this session
2024-08-01 07:42:48.619 AEST [24131] STATEMENT:  select * from currval('s1');
ERROR:  currval of sequence "s1" is not yet defined in this session
test_pub=# select * from nextval('s1');
 nextval
-
   1
(1 row)

test_pub=# select * from currval('s1');
 currval
-
   1
(1 row)

test_pub=#

~~~

OTOH, I was hoping to be able to use currval() at the subscriber=side
to see the current sequence value after issuing ALTER .. REFRESH
PUBLICATION SEQUENCES.

Unfortunately, it has the same behaviour where currval() cannot be
used without nextval(). But, on the subscriber, you probably never
want to do an explicit nextval() independently of the publisher.

Is this currently a bug, or maybe a quirk that should be documented?

For example:

Publisher
==

test_pub=# create sequence s1;
CREATE SEQUENCE
test_pub=# CREATE PUBLICATION pub1 FOR ALL SEQUENCES;
CREATE PUBLICATION
test_pub=# select * from nextval('s1');
 nextval
-
   1
(1 row)

test_pub=# select * from nextval('s1');
 nextval
-
   2
(1 row)

test_pub=# select * from nextval('s1');
 nextval
-
   3
(1 row)

test_pub=#

Subscriber
==

(Notice currval() always gives an error unless nextval() is used prior).

test_sub=# create sequence s1;
CREATE SEQUENCE
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=test_pub'
PUBLICATION pub1;
2024-08-01 07:51:06.955 AEST [24325] WARNING:  subscriptions created
by regression test cases should have names starting with "regress_"
WARNING:  subscriptions created by regression test cases should have
names starting with "regress_"
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2024-08-01 07:51:07.023 AEST [4211] LOG:  logical
replication apply worker for subscription "sub1" has started
2024-08-01 07:51:07.037 AEST [4213] LOG:  logical replication sequence
synchronization worker for subscription "sub1" has started
2024-08-01 07:51:07.063 AEST [4213] LOG:  logical replication
synchronization for subscription "sub1", sequence "s1" has finished
2024-08-01 07:51:07.063 AEST [4213] LOG:  logical replication sequence
synchronization worker for subscription "sub1" has finished

test_sub=# SELECT * FROM currval('s1');
2024-08-01 07:51:19.688 AEST [24325] ERROR:  currval of sequence "s1"
is not yet defined in this session
2024-08-01 07:51:19.688 AEST [24325] STATEMENT:  SELECT * FROM currval('s1');
ERROR:  currval of sequence "s1" is not yet defined in this session
test_sub=# ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION SEQUENCES;
ALTER SUBSCRIPTION
test_sub=# 2024-08-01 07:51:35.298 AEST [4993] LOG:  logical
replication sequence synchronization worker for subscription "sub1"
has started

test_sub=# 2024-08-01 07:51:35.321 AEST [4993] LOG:  logical
replication synchronization for subscription "sub1", sequence "s1" has
finished
2024-08-01 07:51:35.321 AEST [4993] LOG:  logical replication sequence
synchronization worker for subscription "sub1" has finished

test_sub=#
test_sub=# SELECT * FROM currval('s1');
2024-08-01 07:51:41.438 AEST [24325] ERROR:  currval of sequence "s1"
is not yet defined in this session
2024-08-01 07:51:41.438 AEST [24325] STATEMENT:  SELECT * FROM currval('s1');
ERROR:  currval of sequence "s1" is not yet defined in this session
test_sub=#
test_sub=# SELECT * FROM nextval('s1');
 nextval
-
   4
(1 row)

test_sub=# SELECT * FROM currval('s1');
 currval
-
   4
(1 row)

test_sub=#

==
[1] https://www.postgresql.org/docs/current/functions-sequence.html

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Logical Replication of sequences

2024-07-31 Thread Peter Smith
E_INIT,
+InvalidXLogRecPtr);

(This is a continuation of my doubts regarding 'all_relations' in the
previous review comment #4 above)

Here are some more questions about it:

~

5a. Why is this an 'else' of the !bsearch? It needs more explanation
what this case means.

~

5b. Along with more description, it might be better to reverse the
!bsearch condition, so this ('else') code is not so distantly
separated from the condition.

~

5c. Saying "only supported for sequences" seems strange: e.g. what
would it even mean to "re-synchronize" tables? They would all have to
be truncated first -- so if re-sync for tables has no meaning maybe
the parameter is misnamed and should just be 'resync_all_sequences' or
similar? In any case, an Assert here might be good.

==
src/backend/replication/logical/launcher.c

logicalrep_worker_find:

nitpick - I feel the function comment "We are only interested in..."
is now redundant since you are passing the exact worker type you want.
nitpick - I added an Assert for the types you are expecting to look for
nitpick - The comment "Search for attached worker..." is stale now
because there are more search criteria
nitpick - IMO the "Skip parallel apply workers." code is no longer
needed now that you are matching the worker type.

~~~

6. logicalrep_worker_launch

  * - must be valid worker type
  * - tablesync workers are only ones to have relid
  * - parallel apply worker is the only kind of subworker
+ * - sequencesync workers will not have relid
  */
  Assert(wtype != WORKERTYPE_UNKNOWN);
  Assert(is_tablesync_worker == OidIsValid(relid));
  Assert(is_parallel_apply_worker == (subworker_dsm != DSM_HANDLE_INVALID));
+ Assert(!is_sequencesync_worker || !OidIsValid(relid));

On further reflection, is that added comment and added Assert even
needed? I think they can be removed because saying "tablesync workers
are only ones to have relid" seems to already cover what we needed to
say/assert.

~~~

logicalrep_worker_stop:
nitpick - /type/wtype/ for readability

~~~

7.
/*
 * Count the number of registered (not necessarily running) sync workers
 * for a subscription.
 */
int
logicalrep_sync_worker_count(Oid subid)

~

I thought this function should count the sequencesync worker as well.

==
.../replication/logical/sequencesync.c

fetch_remote_sequence_data:
nitpick - tweaked function comment
nitpick - /value/last_value/ for readability

~

8.
+ *lsn = DatumGetInt64(slot_getattr(slot, 4, &isnull));
+ Assert(!isnull);

Should that be DatumGetUInt64?

~~~

copy_sequence:
nitpick - tweak function header.
nitpick - renamed the sequence vars for consistency, and declared them
all together.

==
src/backend/replication/logical/tablesync.c

9.
 void
 invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
 {
- table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
+ relation_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
 }

I assume you changed the 'table_states_validity' name because this is
no longer exclusively for tables. So, should the function name also be
similarly changed?

~~~

process_syncing_sequences_for_apply:
nitpick - tweaked the function comment
nitpick - cannot just say "if there is not one already." a sequence
syn worker might not even be needed.
nitpick - added blank line for readability

~

10.
+ if (syncworker)
+ {
+ /* Now safe to release the LWLock */
+ LWLockRelease(LogicalRepWorkerLock);
+ break;
+ }
+ else
+ {

This 'else' can be removed if you wish to pull back all the indentation.

~~~

11.
process_syncing_tables(XLogRecPtr current_lsn)

Is the function name still OK given that is is now also syncing for sequences?

~~~

FetchTableStates:
nitpick - Reworded some of the function comment
nitpick - Function comment is stale because it is still referring to
the function parameter which this patch removed.
nitpick - tweak a comment

==
src/include/commands/sequence.h

12.
+#define SEQ_LOG_CNT_INVALID (-1)

See a previous review comment (#2 above) where I wondered why not use
value 0 for this.

~~~

13.
 extern void SequenceChangePersistence(Oid relid, char newrelpersistence);
 extern void DeleteSequenceTuple(Oid relid);
 extern void ResetSequence(Oid seq_relid);
+extern void do_setval(Oid relid, int64 next, bool iscalled, int64 logcnt);
 extern void ResetSequenceCaches(void);

do_setval() was an OK function name when it was static, but as an
exposed API it seems like a terrible name. IMO rename it to something
like 'SetSequence' to match the other API functions nearby.

~

nitpick - same change to the parameter names as suggested for the
implementation.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 22b2a93..16c427e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8110,9 +8110,10 @@ 

Re: Pgoutput not capturing the generated columns

2024-07-29 Thread Peter Smith
Thanks for the patch updates.

Here are my review comments for v21-0001.

I think this patch is mostly OK now except there are still some
comments about the TAP test.

==
Commit Message

0.
Using Create Subscription:
CREATE SUBSCRIPTION sub2_gen_to_gen CONNECTION '$publisher_connstr' PUBLICATION
pub1 WITH (include_generated_columns = true, copy_data = false)"

If you are going to give an example, I think a gen-to-nogen example
would be a better choice. That's because the gen-to-gen (as you have
here) is not going to replicate anything due to the subscriber-side
column being generated.

==
src/test/subscription/t/011_generated.pl

1.
+$node_subscriber2->safe_psql('postgres',
+ "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 22) STORED, c int)"
+);

The subscriber2 node was intended only for all the tables where we
need include_generated_columns to be true. Mostly that is the
combination tests. (tab_gen_to_nogen, tab_nogen_to_gen, etc) OTOH,
table 'tab1' already existed. I don't think we need to bother
subscribing to tab1 from subscriber2 because every combination is
already covered by the combination tests. Let's leave this one alone.


~~~

2.
Huh? Where is the "tab_nogen_to_gen" combination test that I sent to
you off-list?

~~~

3.
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_order (c int GENERATED ALWAYS AS (a * 22) STORED,
a int, b int)"
+);

Maybe you can test 'tab_order' on both subscription nodes but I think
it is overkill. IMO it is enough to test it on subscription2.

~~~

4.
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab_alter (a int, b int, c int GENERATED ALWAYS AS (a
* 22) STORED)"
+);

Ditto above. Maybe you can test 'tab_order' on both subscription nodes
but I think it is overkill. IMO it is enough to test it on
subscription2.

~~~

5.
Don't forget to add initial data for the missing nogen_to_gen table/test.

~~~

6.
 $node_publisher->safe_psql('postgres',
- "CREATE PUBLICATION pub1 FOR ALL TABLES");
+ "CREATE PUBLICATION pub1 FOR TABLE tab1, tab_gen_to_gen,
tab_gen_to_nogen, tab_gen_to_missing, tab_missing_to_gen, tab_order");
+
 $node_subscriber->safe_psql('postgres',
  "CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub1"
 );

It is not a bad idea to reduce the number of publications as you have
done, but IMO jamming all the tables into 1 publication is too much
because it makes it less understandable instead of simpler.

How about this:
- leave the 'pub1' just for 'tab1'.
- have a 'pub_combo' for publication all the gen_to_nogen,
nogen_to_gen etc combination tests.
- and a 'pub_misc' for any other misc tables like tab_order.

~~~

7.
+#
 # Wait for initial sync of all subscriptions
+#

I think you should write a note here that you have deliberately set
copy_data = false because COPY and include_generated_columns are not
allowed at the same time for patch 0001. And that is why all expected
results on subscriber2 will be empty. Also, say this limitation will
be changed in patch 0002.

~~~

(I didn't yet check 011_generated.pl file results beyond this point...
I'll wait for v22-0001 to review further)

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-07-25 Thread Peter Smith
Hi Vignesh,

There are still pending changes from my previous review of the
0720-0003 patch [1], but here are some new review comments for your
latest patch v20240525-0003.

==
doc/src/sgml/catalogs.sgml

nitpick - fix plurals and tweak the description.

~~~

1.
   
-   This catalog only contains tables known to the subscription after running
-   either CREATE
SUBSCRIPTION or
-   ALTER SUBSCRIPTION
... REFRESH
+   This catalog only contains tables and sequences known to the subscription
+   after running either
+   CREATE
SUBSCRIPTION
+   or ALTER
SUBSCRIPTION ... REFRESH
PUBLICATION.
   

Shouldn't this mention "REFRESH PUBLICATION SEQUENCES" too?

==
src/backend/commands/sequence.c

SetSequenceLastValue:
nitpick - maybe change: /log_cnt/new_log_cnt/ for consistency with the
other parameter, and to emphasise the old log_cnt is overwritten

==
src/backend/replication/logical/sequencesync.c

2.
+/*
+ * fetch_remote_sequence_data
+ *
+ * Fetch sequence data (current state) from the remote node, including
+ * the latest sequence value from the publisher and the Page LSN for the
+ * sequence.
+ */
+static int64
+fetch_remote_sequence_data(WalReceiverConn *conn, Oid remoteid,
+int64 *log_cnt, XLogRecPtr *lsn)

2a.
Now you are also returning the 'log_cnt' but that is not mentioned by
the function comment.

~

2b.
Is it better to name these returned by-ref ptrs like 'ret_log_cnt',
and 'ret_lsn' to emphasise they are output variables? YMMV.

~~~

3.
+ /* Process the sequence. */
+ slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+ while (tuplestore_gettupleslot(res->tuplestore, true, false, slot))

This will have one-and-only-one tuple for the discovered sequence,
won't it? So, why is this a while loop?

==
src/include/commands/sequence.h

nitpick - maybe change: /log_cnt/new_log_cnt/ (same as earlier in this post)

==
src/test/subscription/t/034_sequences.pl

4.
Q. Should we be suspicious that log_cnt changes from '32' to '31', or
is there a valid explanation? It smells like some calculation is
off-by-one, but without debugging I can't tell if it is right or
wrong.

==
Please also see the attached diffs patch, which implements the
nitpicks mentioned above.

==
[1] 0720-0003 review -
https://www.postgresql.org/message-id/CAHut%2BPsfsfzyBrmo8E43qFMp9_bmen2tuCsNYN8sX%3Dfa86SdfA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 19d04b1..dcd0b98 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8102,8 +8102,8 @@ SCRAM-SHA-256$<iteration 
count>:&l
   
 
   
-   The catalog pg_subscription_rel contains the
-   state for each replicated tables and sequences in each subscription.  This
+   The catalog pg_subscription_rel stores the
+   state of each replicated table and sequence for each subscription.  This
is a many-to-many mapping.
   
 
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index a3e7c79..f292fbc 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -342,7 +342,7 @@ ResetSequence(Oid seq_relid)
  * logical replication.
  */
 void
-SetSequenceLastValue(Oid seq_relid, int64 new_last_value, int64 log_cnt)
+SetSequenceLastValue(Oid seq_relid, int64 new_last_value, int64 new_log_cnt)
 {
SeqTableelm;
Relationseqrel;
@@ -370,7 +370,7 @@ SetSequenceLastValue(Oid seq_relid, int64 new_last_value, 
int64 log_cnt)
 
seq->last_value = new_last_value;
seq->is_called = true;
-   seq->log_cnt = log_cnt;
+   seq->log_cnt = new_log_cnt;
 
MarkBufferDirty(buf);
 
diff --git a/src/include/commands/sequence.h b/src/include/commands/sequence.h
index a302890..4c6aee0 100644
--- a/src/include/commands/sequence.h
+++ b/src/include/commands/sequence.h
@@ -60,7 +60,7 @@ extern ObjectAddress AlterSequence(ParseState *pstate, 
AlterSeqStmt *stmt);
 extern void SequenceChangePersistence(Oid relid, char newrelpersistence);
 extern void DeleteSequenceTuple(Oid relid);
 extern void ResetSequence(Oid seq_relid);
-extern void SetSequenceLastValue(Oid seq_relid, int64 new_last_value, int64 
log_cnt);
+extern void SetSequenceLastValue(Oid seq_relid, int64 new_last_value, int64 
new_log_cnt);
 extern void ResetSequenceCaches(void);
 
 extern void seq_redo(XLogReaderState *record);


Re: Logical Replication of sequences

2024-07-25 Thread Peter Smith
Here are some review comments for latest patch v20240725-0002

==
doc/src/sgml/ref/create_publication.sgml

nitpick - tweak to the description of the example.

==
src/backend/parser/gram.y

preprocess_pub_all_objtype_list:
nitpick - typo "allbjects_list"
nitpick - reword function header
nitpick - /alltables/all_tables/
nitpick - /allsequences/all_sequences/
nitpick - I think code is safe as-is because makeNode internally does
palloc0, but OTOH adding Assert would be nicer just to remove any
doubts.

==
src/bin/psql/describe.c

1.
+ /* Print any publications */
+ if (pset.sversion >= 18)
+ {
+ int tuples = 0;

No need to assign value 0 here, because this will be unconditionally
assigned before use anyway.



2. describePublications

  has_pubviaroot = (pset.sversion >= 13);
+ has_pubsequence = (pset.sversion >= 18000);

That's a bug! Should be 18, not 18000.

==

And, please see the attached diffs patch, which implements the
nitpicks mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ref/create_publication.sgml 
b/doc/src/sgml/ref/create_publication.sgml
index 7dcfe37..783874f 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -420,7 +420,7 @@ CREATE PUBLICATION users_filtered FOR TABLE users (user_id, 
firstname);
 
 
   
-   Create a publication that synchronizes all the sequences:
+   Create a publication that publishes all sequences for synchronization:
 
 CREATE PUBLICATION all_sequences FOR ALL SEQUENCES;
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 585f61e..9b3cad1 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -215,9 +215,9 @@ static void processCASbits(int cas_bits, int location, 
const char *constrType,
 static PartitionStrategy parsePartitionStrategy(char *strategy);
 static void preprocess_pubobj_list(List *pubobjspec_list,
   
core_yyscan_t yyscanner);
-static void preprocess_pub_all_objtype_list(List *allbjects_list,
-   
bool *alltables,
-   
bool *allsequences,
+static void preprocess_pub_all_objtype_list(List *all_objects_list,
+   
bool *all_tables,
+   
bool *all_sequences,

core_yyscan_t yyscanner);
 static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node 
*query);
 
@@ -19440,39 +19440,42 @@ parsePartitionStrategy(char *strategy)
 }
 
 /*
- * Process all_objects_list to check if the options have been specified more 
than
- * once and set alltables/allsequences.
+ * Process all_objects_list to set all_tables/all_sequences.
+ * Also, checks if the pub_object_type has been specified more than once.
  */
 static void
-preprocess_pub_all_objtype_list(List *all_objects_list, bool *alltables,
-   bool 
*allsequences, core_yyscan_t yyscanner)
+preprocess_pub_all_objtype_list(List *all_objects_list, bool *all_tables,
+   bool 
*all_sequences, core_yyscan_t yyscanner)
 {
if (!all_objects_list)
return;
 
+   Assert(all_tables && *all_tables == false);
+   Assert(all_sequences && *all_sequences == false);
+
foreach_ptr(PublicationAllObjSpec, obj, all_objects_list)
{
if (obj->pubobjtype == PUBLICATION_ALL_TABLES)
{
-   if (*alltables)
+   if (*all_tables)
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
errmsg("invalid publication 
object list"),
errdetail("TABLES can be 
specified only once."),

parser_errposition(obj->location));
 
-   *alltables = true;
+   *all_tables = true;
}
else if (obj->pubobjtype == PUBLICATION_ALL_SEQUENCES)
{
-   if (*allsequences)
+   if (*all_sequences)
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
errmsg("invalid publication 
object list"),
 

Re: Logical Replication of sequences

2024-07-25 Thread Peter Smith
; maths seems unnecessarily tricky. Can't we
just increment the cur_seq? before this calculation?

~

nitpick - simplify the comment about batching
nitpick - added a comment to the commit

==
src/backend/replication/logical/tablesync.c

finish_sync_worker:
nitpick - added an Assert so the if/else is less risky.
nitpick - modify the comment about failure time when it is a clean exit

~~~

15. process_syncing_sequences_for_apply

+ /* We need up-to-date sync state info for subscription sequences here. */
+ FetchTableStates(&started_tx, SUB_REL_KIND_ALL);

Should that say SUB_REL_KIND_SEQUENCE?

~

16.
+ /*
+ * If there are free sync worker slot(s), start a new sequence
+ * sync worker, and break from the loop.
+ */
+ if (nsyncworkers < max_sync_workers_per_subscription)

Should this "if" have some "else" code to log a warning if we have run
out of free workers? Otherwise, how will the user know that the system
may need tuning?

~~~

17. FetchTableStates

  /* Fetch all non-ready tables. */
- rstates = GetSubscriptionRelations(MySubscription->oid, true);
+ rstates = GetSubscriptionRelations(MySubscription->oid, rel_type, true);

This feels risky. IMO there needs to be some prior Assert about the
rel_type. For example, if it happened to be SUB_REL_KIND_SEQUENCE then
this function code doesn't seem to make sense.

~~~

==
src/backend/replication/logical/worker.c

18. SetupApplyOrSyncWorker

+
+ if (isSequenceSyncWorker(MyLogicalRepWorker))
+ before_shmem_exit(logicalrep_seqsyncworker_failuretime, (Datum) 0);

Probably that should be using macro am_sequencesync_worker(), right?

==
src/include/catalog/pg_subscription_rel.h

19.
+typedef enum
+{
+ SUB_REL_KIND_TABLE,
+ SUB_REL_KIND_SEQUENCE,
+ SUB_REL_KIND_ALL,
+} SubscriptionRelKind;
+

I was not sure how helpful this is; it might not be needed. e.g. see
review comment for GetSubscriptionRelations

~~~

20.
+extern List *GetSubscriptionRelations(Oid subid, SubscriptionRelKind reltype,
+   bool not_ready);

There is a mismatch with the ‘not_ready’ parameter name here and in
the function implementation

==
src/test/subscription/t/034_sequences.pl

nitpick - removed a blank line

==
99.
Please also see the attached diffs patch which implements all the
nitpicks mentioned above.

==
[1] syntax - 
https://www.postgresql.org/message-id/CAHut%2BPuFH1OCj-P1UKoRQE2X4-0zMG%2BN1V7jdn%3DtOQV4RNbAbw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/catalog/pg_subscription.c 
b/src/backend/catalog/pg_subscription.c
index 5610c07..04d322a 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -494,14 +494,18 @@ HasSubscriptionRelations(Oid subid)
 /*
  * Get the relations for the subscription.
  *
- * If rel_type is SUB_REL_KIND_SEQUENCE, get only the sequences. If rel_type is
- * SUB_REL_KIND_TABLE, get only the tables. If rel_type is SUB_REL_KIND_ALL,
- * get both tables and sequences.
+ * rel_type:
+ * If SUB_REL_KIND_SEQUENCE, return only the sequences.
+ * If SUB_REL_KIND_TABLE, return only the tables.
+ * If SUB_REL_KIND_ALL, return both tables and sequences.
+ *
+ * not_all_relations:
  * If not_all_relations is true for SUB_REL_KIND_TABLE and SUB_REL_KIND_ALL,
  * return only the relations that are not in a ready state, otherwise return 
all
  * the relations of the subscription. If not_all_relations is true for
  * SUB_REL_KIND_SEQUENCE, return only the sequences that are in init state,
  * otherwise return all the sequences of the subscription.
+ *
  * The returned list is palloc'ed in the current memory context.
  */
 List *
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index d23901a..2f9ff8b 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -879,11 +879,13 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
  * Update the subscription to refresh both the publication and the publication
  * objects associated with the subscription.
  *
- * If the copy_data parameter is true, the function will set the state
- * to "init"; otherwise, it will set the state to "ready". When the
- * validate_publications is provided with a publication list, the function
- * checks that the  specified publications exist on the publisher. If
- * refresh_all_sequences is  true, it will mark all sequences with "init" state
+ * If 'copy_data' parameter is true, the function will set the state
+ * to "init"; otherwise, it will set the state to "ready".
+ *
+ * When 'validate_publications' is provided with a publication list, the 
function
+ * checks that the specified publications exist on the publisher.
+ *
+ * If 'refresh_all_sequences' is true, it will mark all sequences with "init" 
state
  * for re-synchronization; oth

Re: Logical Replication of sequences

2024-07-23 Thread Peter Smith
Hi, here are some review comments for patch v20240720-0003.

This review is a WIP. This post is only about the docs (*.sgml) of patch 0003.

==
doc/src/sgml/ref/alter_subscription.sgml

1. REFRESH PUBLICATION and copy_data
nitpicks:
- IMO the "synchronize the sequence data" info was misleading because
synchronization should only occur when copy_data=true.
- I also felt it was strange to mention pg_subscription_rel for
sequences, but not for tables. I modified this part too.
- Then I moved the information about re/synchronization of sequences
into the "copy_data" part.
- And added another link to ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES

Anyway, in summary, I have updated this page quite a lot according to
my understanding. Please take a look at the attached nitpick for my
suggestions.

nitpick - /The supported options are:/The only supported option is:/

~~~

2. REFRESH PUBLICATION SEQUENCES
nitpick - tweaked the wording
nitpicK - typo /syncronizes/synchronizes/

==
3. catalogs.sgml

IMO something is missing in Section "1.55. pg_subscription_rel".

Currently, this page only talks of relations/tables, but I think it
should mention "sequences" here too, particularly since now we are
linking to here from ALTER SUBSCRIPTION when talking about sequences.

==
99.
Please see the attached diffs patch which implements any nitpicks
mentioned above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ref/alter_subscription.sgml 
b/doc/src/sgml/ref/alter_subscription.sgml
index ba8c2b1..666d9b0 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -153,7 +153,7 @@ ALTER SUBSCRIPTION name RENAME TO <
 REFRESH PUBLICATION
 
  
-  Fetch missing table information from publisher.  This will start
+  Fetch missing table information from the publisher.  This will start
   replication of tables that were added to the subscribed-to publications
   since 
   CREATE SUBSCRIPTION or
@@ -161,29 +161,26 @@ ALTER SUBSCRIPTION name RENAME TO <
  
 
  
-  It also fetches the missing sequence information from the publisher and
-  synchronize the sequence data for newly added sequences with the
-  publisher. This will start synchronizing of sequences that were added to
-  the subscribed-to publications since 
-  CREATE SUBSCRIPTION or the last invocation of
-  REFRESH PUBLICATION. Additionally, it will remove any
-  sequences that are no longer part of the publication from the
-  pg_subscription_rel
-  system catalog. Sequences that have already been synchronized will not be
-  re-synchronized.
+  Also, fetch missing sequence information from the publisher.
+ 
+
+ 
+  The system catalog pg_subscription_rel
+  is updated to record all tables and sequences known to the subscription,
+  that are still part of the publication.
  
 
  
   refresh_option specifies additional options 
for the
-  refresh operation.  The supported options are:
+  refresh operation.  The only supported option is:
 
   

 copy_data (boolean)
 
  
-  Specifies whether to copy pre-existing data for tables and sequences
-  in the publications that are being subscribed to when the replication
+  Specifies whether to copy pre-existing data for tables and 
synchronize
+  sequences in the publications that are being subscribed to when the 
replication
   starts. The default is true.
  
  
@@ -191,6 +188,11 @@ ALTER SUBSCRIPTION name RENAME TO <
   filter WHERE clause has since been modified.
  
  
+  Previously subscribed sequences are not re-synchronized. To do that,
+  see 
+  ALTER SUBSCRIPTION ... REFRESH PUBLICATION 
SEQUENCES
+ 
+ 
   See  for details of
   how copy_data = true can interact with the
   origin
@@ -212,11 +214,11 @@ ALTER SUBSCRIPTION name RENAME TO <
 REFRESH PUBLICATION SEQUENCES
 
  
-  Fetch missing sequences information from publisher and re-synchronize the
+  Fetch missing sequence information from the publisher, then 
re-synchronize
   sequence data with the publisher. Unlike 
   ALTER SUBSCRIPTION ... REFRESH PUBLICATION 
which
-  only syncronizes the newly added sequences, this option will also
-  re-synchronize the sequence data for sequences that were previously 
added.
+  only synchronizes newly added sequences, REFRESH PUBLICATION 
SEQUENCES
+  will re-synchronize the sequence data for all subscribed sequences.
  
 



Re: Logical Replication of sequences

2024-07-22 Thread Peter Smith
Here are some review comments for patch v20240720-0002.

==
1. Commit message:

1a.
The commit message is stale. It is still referring to functions and
views that have been moved to patch 0003.

1b.
"ALL SEQUENCES" is not a command. It is a clause of the CREATE
PUBLICATION command.

==
doc/src/sgml/ref/create_publication.sgml

nitpick - publication name in the example /allsequences/all_sequences/

==
src/bin/psql/describe.c

2. describeOneTableDetails

Although it's not the fault of this patch, this patch propagates the
confusion of 'result' versus 'res'. Basically, I did not understand
the need for the variable 'result'. There is already a "PGResult
*res", and unless I am mistaken we can just keep re-using that instead
of introducing a 2nd variable having almost the same name and purpose.

~

nitpick - comment case
nitpick - rearrange comment

==
src/test/regress/expected/publication.out

(see publication.sql)

==
src/test/regress/sql/publication.sql

nitpick - tweak comment

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ref/create_publication.sgml 
b/doc/src/sgml/ref/create_publication.sgml
index c9c1b92..7dcfe37 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -422,7 +422,7 @@ CREATE PUBLICATION users_filtered FOR TABLE users (user_id, 
firstname);
   
Create a publication that synchronizes all the sequences:
 
-CREATE PUBLICATION allsequences FOR ALL SEQUENCES;
+CREATE PUBLICATION all_sequences FOR ALL SEQUENCES;
 
   
  
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0f3f86b..a92af54 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1851,7 +1851,7 @@ describeOneTableDetails(const char *schemaname,
}
PQclear(result);
 
-   /* print any publications */
+   /* Print any publications */
if (pset.sversion >= 18)
{
int tuples = 0;
@@ -1867,8 +1867,8 @@ describeOneTableDetails(const char *schemaname,
if (!result)
goto error_return;
 
-   tuples = PQntuples(result);
/* Might be an empty set - that's ok */
+   tuples = PQntuples(result);
if (tuples > 0)
{
printTableAddFooter(&cont, _("Publications:"));
diff --git a/src/test/regress/expected/publication.out 
b/src/test/regress/expected/publication.out
index 3ea2224..6c573a1 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -259,7 +259,7 @@ Publications:
 SET client_min_messages = 'ERROR';
 CREATE PUBLICATION regress_pub_forallsequences2 FOR ALL SEQUENCES;
 RESET client_min_messages;
--- Check describe sequence lists both the publications
+-- check that describe sequence lists all publications the sequence belongs to
 \d+ pub_test.regress_pub_seq1
  Sequence "pub_test.regress_pub_seq1"
   Type  | Start | Minimum |   Maximum   | Increment | Cycles? | Cache 
diff --git a/src/test/regress/sql/publication.sql 
b/src/test/regress/sql/publication.sql
index 8d553ed..ac77fe4 100644
--- a/src/test/regress/sql/publication.sql
+++ b/src/test/regress/sql/publication.sql
@@ -134,7 +134,7 @@ SET client_min_messages = 'ERROR';
 CREATE PUBLICATION regress_pub_forallsequences2 FOR ALL SEQUENCES;
 RESET client_min_messages;
 
--- Check describe sequence lists both the publications
+-- check that describe sequence lists all publications the sequence belongs to
 \d+ pub_test.regress_pub_seq1
 
 --- FOR ALL specifying both TABLES and SEQUENCES


Re: Pgoutput not capturing the generated columns

2024-07-22 Thread Peter Smith
On Fri, Jul 19, 2024 at 4:01 PM Shlok Kyal  wrote:
>
> On Thu, 18 Jul 2024 at 13:55, Peter Smith  wrote:
> >
> > Hi, here are some review comments for v19-0002
> > ==
> > src/test/subscription/t/004_sync.pl
> >
> > 1.
> > This new test is not related to generated columns. IIRC, this is just
> > some test that we discovered missing during review of this thread. As
> > such, I think this change can be posted/patched separately from this
> > thread.
> >
> I have removed the test for this thread.
>
> I have also addressed the remaining comments for v19-0002 patch.

Hi, I have no more review comments for patch v20-0002 at this time.

I saw that the above test was removed from this thread as suggested,
but I could not find that any new thread was started to propose this
valuable missing test.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-07-22 Thread Peter Smith
Hi, Patch v22-0001 LGTM apart from the following nitpicks

==
src/sgml/ref/alter_subscription.sgml

nitpick - /one needs to/you need to/

==
src/backend/commands/subscriptioncmds.c

CheckAlterSubOption:
nitpick = "ideally we could have..." doesn't make sense because the
code uses a more consistent/simpler way. So other option was not ideal
after all.

AlterSubscription
nitpick - typo /syncronization/synchronization/
nipick - plural fix

==
Kind Regards,
Peter Smith.
Fujitsu Australia.
diff --git a/doc/src/sgml/ref/alter_subscription.sgml 
b/doc/src/sgml/ref/alter_subscription.sgml
index cbba1ee..6af6d0d 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -272,7 +272,7 @@ ALTER SUBSCRIPTION name RENAME TO <
   logical replication worker corresponding to a particular subscription 
have
   the following pattern: pg_gid_%u_%u
   (parameters: subscription oid, remote transaction 
id xid).
-  To resolve such transactions manually, one needs to roll back all
+  To resolve such transactions manually, you need to roll back all
   the prepared transactions with corresponding subscription IDs in their
   names. Applications can check
   pg_prepared_xacts
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 27560f1..b21f5c0 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1090,9 +1090,9 @@ CheckAlterSubOption(Subscription *sub, const char *option,
 * publisher cannot be modified if the slot is currently acquired by the
 * existing walsender.
 *
-* Note that the two_phase is enabled (aka changed from 'false' to 
'true')
-* on the publisher by the existing walsender so, ideally, we can allow
-* that even when a subscription is enabled. But we kept this 
restriction
+* Note that two_phase is enabled (aka changed from 'false' to 'true')
+* on the publisher by the existing walsender, so we could have allowed
+* that even when the subscription is enabled. But we kept this 
restriction
 * for the sake of consistency and simplicity.
 */
if (sub->enabled)
@@ -1281,7 +1281,7 @@ AlterSubscription(ParseState *pstate, 
AlterSubscriptionStmt *stmt,
 * We need to update both the slot and 
the subscription
 * for two_phase option. We can enable 
the two_phase
 * option for a slot only once the 
initial data
-* syncronization is done. This is to 
avoid missing some
+* synchronization is done. This is to 
avoid missing some
 * data as explained in comments atop 
worker.c.
 */
update_two_phase = !opts.twophase;
@@ -1306,9 +1306,9 @@ AlterSubscription(ParseState *pstate, 
AlterSubscriptionStmt *stmt,
 *
 * Ensure workers have already been 
exited to avoid
 * getting prepared transactions while 
we are disabling
-* two_phase option. Otherwise, the 
changes of already
-* prepared transactions can be 
replicated again along
-* with its corresponding commit 
leading to duplicate data
+* two_phase option. Otherwise, the 
changes of an already
+* prepared transaction can be 
replicated again along
+* with its corresponding commit, 
leading to duplicate data
 * or errors.
 */
if (logicalrep_workers_find(subid, 
true, true))


Re: walsender.c fileheader comment

2024-07-19 Thread Peter Smith
Hi, Thankyou for taking the time to look at this and reply.

>
> I did look at this, and while the explanation in the current comment may
> seem a bit confusing, I'm not sure the suggested changes improve the
> situation very much.
>
> This suggests the two comments somehow disagree, but it does not say in
> what exactly, so perhaps I just missed it :-(
>
> ISTM there's a bit of confusion what is meant by "stopping" state - you
> seem to be interpreting it as a general concept, where the walsender is
> requested to stop (through the signal), and starts doing stuff to exit.
> But the comments actually talk about WalSnd->state, where "stopping"
> means it needs to be set to WALSNDSTATE_STOPPING.

Yes, I interpreted the "stopping" state meaning as when the boolean
flag 'got_STOPPING' is assigned true.

>
> And we only ever switch to that state in two places - in WalSndPhysical
> and exec_replication_command. And that does not happen in regular
> logical replication (which is what "logical replication is in progress"
> refers to) - if you have a walsender just replicating DML, it will never
> see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while
> still in WALSNDSTATE_STREAMING state, and then just exit.
>
> So from this point of view, the suggestion is actually wrong.

OK.

>
> To conclude, I think this probably makes the comments more confusing. If
> we want to make it clearer, I'd probably start by clarifying what the
> "stopping" state means. Also, it's a bit surprising we may not actually
> go through the "stopping" state during shutdown.
>

I agree. My interpretation of the (ambiguous) "stopping" state led me
to believe the comment was quite wrong. So, this thread was only
intended as a trivial comment fix in passing but clearly there is more
to this than I anticipated. I would be happy if someone with more
knowledge about the WALSNDSTATE_STOPPING versus got_STOPPING could
disambiguate the file header comment, but that's not me, so I have
withdrawn this from the Commitfest.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-07-18 Thread Peter Smith
On Thu, Jul 18, 2024 at 9:42 PM Amit Kapila  wrote:
>
...
> I agree and have done that in the attached. I have made some
> additional changes: (a) removed the unrelated change of two_phase in
> protocol.sgml, (b) tried to make the two_phase change before failover
> option wherever it makes sense to keep the code consistent, (c)
> changed/added comments and doc changes at various places.
>
> I'll continue my review and testing of the patch but I thought of
> sharing what I have done till now.
>

Here some minor comments for patch v21

==
You wrote "tried to make the two_phase change before failover option
wherever it makes sense to keep the code consistent". But, still
failover is coded first in lots of places:
- libpqrcv_alter_slot
- ReplicationSlotAlter
- AlterReplicationSlot
etc.

Q. Why not change those ones?

==
src/backend/access/transam/twophase.c

IsTwoPhaseTransactionGidForSubid:
nitpick - nicer to rename the temporary gid variable: /gid_generated/gid_tmp/

==
src/backend/commands/subscriptioncmds.c

CheckAlterSubOption:
nitpick = function comment period/plural.
nitpick - typo /Samilar/Similar/

==
src/include/replication/slot.h

1.
-extern void ReplicationSlotAlter(const char *name, bool failover);
+extern void ReplicationSlotAlter(const char *name, bool *failover,
+ bool *two_phase);

Use const?

==
99.
Please see attached diffs implementing the nitpicks mentioned above

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/access/transam/twophase.c 
b/src/backend/access/transam/twophase.c
index 90df32f..da97836 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2712,7 +2712,7 @@ IsTwoPhaseTransactionGidForSubid(Oid subid, char *gid)
int ret;
Oid subid_from_gid;
TransactionId xid_from_gid;
-   chargid_generated[GIDSIZE];
+   chargid_tmp[GIDSIZE];
 
/* Extract the subid and xid from the given GID */
ret = sscanf(gid, "pg_gid_%u_%u", &subid_from_gid, &xid_from_gid);
@@ -2729,10 +2729,9 @@ IsTwoPhaseTransactionGidForSubid(Oid subid, char *gid)
 * the given GID and check whether the temporary GID and the given GID
 * match.
 */
-   TwoPhaseTransactionGid(subid, xid_from_gid, gid_generated,
-  sizeof(gid_generated));
+   TwoPhaseTransactionGid(subid, xid_from_gid, gid_tmp, sizeof(gid_tmp));
 
-   return strcmp(gid, gid_generated) == 0;
+   return strcmp(gid, gid_tmp) == 0;
 }
 
 /*
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 2e9f329..5f11235 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1071,7 +1071,7 @@ AlterSubscription_refresh(Subscription *sub, bool 
copy_data,
 }
 
 /*
- * Common checks for altering failover and two_phase option
+ * Common checks for altering failover and two_phase options.
  */
 static void
 CheckAlterSubOption(Subscription *sub, const char *option,
@@ -1337,8 +1337,8 @@ AlterSubscription(ParseState *pstate, 
AlterSubscriptionStmt *stmt,
if (IsSet(opts.specified_opts, SUBOPT_FAILOVER))
{
/*
-* Samilar to two_phase, we need to 
update the failover
-* option for the slot and the 
subscription.
+* Similar to the two_phase case above, 
we need to update
+* the failover option for the slot and 
the subscription.
 */
update_failover = true;
 


Re: Pgoutput not capturing the generated columns

2024-07-18 Thread Peter Smith
Hi, here are some review comments for patch v19-0003

==
src/backend/catalog/pg_publication.c

1.
/*
 * Translate a list of column names to an array of attribute numbers
 * and a Bitmapset with them; verify that each attribute is appropriate
 * to have in a publication column list (no system or generated attributes,
 * no duplicates).  Additional checks with replica identity are done later;
 * see pub_collist_contains_invalid_column.
 *
 * Note that the attribute numbers are *not* offset by
 * FirstLowInvalidHeapAttributeNumber; system columns are forbidden so this
 * is okay.
 */
static void
publication_translate_columns(Relation targetrel, List *columns,
  int *natts, AttrNumber **attrs)

~

I though the above comment ought to change: /or generated
attributes/or virtual generated attributes/

IIRC this was already addressed back in v16, but somehow that fix has
been lost (???).

==
src/backend/replication/logical/tablesync.c

fetch_remote_table_info:
nitpick - missing end space in this comment /* TODO: use
ATTRIBUTE_GENERATED_VIRTUAL*/

==

2.
(in patch v19-0001)
+# tab3:
+# publisher-side tab3 has generated col 'b'.
+# subscriber-side tab3 has generated col 'b', using a different computation.

(here, in patch v19-0003)
 # tab3:
-# publisher-side tab3 has generated col 'b'.
-# subscriber-side tab3 has generated col 'b', using a different computation.
+# publisher-side tab3 has stored generated col 'b' but
+# subscriber-side tab3 has DIFFERENT COMPUTATION stored generated col 'b'.

It has become difficult to review these TAP tests, particularly when
different patches are modifying the same comment. e.g. I post
suggestions to modify comments for patch 0001. Those get addressed OK,
only to vanish in subsequent patches like has happened in the above
example.

Really this patch 0003 was only supposed to add the word "stored", not
revert the entire comment to something from an earlier version. Please
take care that all comment changes are carried forward correctly from
one patch to the next.

==
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Pgoutput not capturing the generated columns

2024-07-18 Thread Peter Smith
Hi, here are some review comments for v19-0002

==
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:
nitpick - tweak function comment
nitpick - tweak other comments

~~~

fetch_remote_table_info:
nitpick - add space after "if"
nitpick - removed a comment because logic is self-evident from the variable name

==
src/test/subscription/t/004_sync.pl

1.
This new test is not related to generated columns. IIRC, this is just
some test that we discovered missing during review of this thread. As
such, I think this change can be posted/patched separately from this
thread.

==
src/test/subscription/t/011_generated.pl

nitpick - change some comment wording to be more consistent with patch 0001.

==
99.
Please see the nitpicks diff attachment which implements any nitpicks
mentioned above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 935be7f..2e90d42 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -693,8 +693,8 @@ process_syncing_tables(XLogRecPtr current_lsn)
 }
 
 /*
- * Create list of columns for COPY based on logical relation mapping. Do not
- * include generated columns of the subscription table in the column list.
+ * Create list of columns for COPY based on logical relation mapping.
+ * Exclude columns that are subscription table generated columns.
  */
 static List *
 make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool *remotegenlist)
@@ -707,7 +707,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool 
*remotegenlist)
localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));
 
/*
-* This loop checks for generated columns on subscription table.
+* This loop checks for generated columns of the subscription table.
 */
for (int i = 0; i < desc->natts; i++)
{
@@ -1010,15 +1010,12 @@ fetch_remote_table_info(char *nspname, char *relname, 
bool **remotegenlist_res,
 " WHERE a.attnum > 0::pg_catalog.int2"
 "   AND NOT a.attisdropped", 
lrel->remoteid);
 
-   if(server_version >= 12)
+   if (server_version >= 12)
{
bool gencols_allowed = server_version >= 18 && 
MySubscription->includegencols;
 
-   if(!gencols_allowed)
-   {
-   /* Replication of generated cols is not supported. */
+   if (!gencols_allowed)
appendStringInfo(&cmd, " AND a.attgenerated = ''");
-   }
}
 
appendStringInfo(&cmd,
diff --git a/src/test/subscription/t/011_generated.pl 
b/src/test/subscription/t/011_generated.pl
index 1814628..4537c6c 100644
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl
@@ -46,9 +46,9 @@ $node_subscriber->safe_psql('postgres',
"CREATE TABLE tab3 (a int, b int GENERATED ALWAYS AS (a + 20) STORED)");
 
 # tab4:
-# publisher-side tab4 has generated cols 'b' and 'c' but
-# subscriber-side tab4 has non-generated col 'b', and generated-col 'c'
-# where columns on publisher/subscriber are in a different order
+# Publisher-side tab4 has generated cols 'b' and 'c'.
+# Subscriber-side tab4 has non-generated col 'b', and generated-col 'c'.
+# Columns on publisher/subscriber are in a different order.
 $node_publisher->safe_psql('postgres',
"CREATE TABLE tab4 (a int, b int GENERATED ALWAYS AS (a * 2) STORED, c 
int GENERATED ALWAYS AS (a * 2) STORED)"
 );
@@ -57,14 +57,14 @@ $node_subscriber->safe_psql('postgres',
 );
 
 # tab5:
-# publisher-side tab5 has non-generated col 'b' but
-# subscriber-side tab5 has generated col 'b'
+# Publisher-side tab5 has non-generated col 'b'.
+# Subscriber-side tab5 has generated col 'b'.
 $node_publisher->safe_psql('postgres', "CREATE TABLE tab5 (a int, b int)");
 $node_subscriber->safe_psql('postgres',
"CREATE TABLE tab5 (a int, b int GENERATED ALWAYS AS (a * 22) STORED)");
 
 # tab6:
-# tables for testing ALTER SUBSCRIPTION ... REFRESH PUBLICATION
+# Tables for testing ALTER SUBSCRIPTION ... REFRESH PUBLICATION
 $node_publisher->safe_psql('postgres',
"CREATE TABLE tab6 (a int, b int GENERATED ALWAYS AS (a * 2) STORED, c 
int GENERATED ALWAYS AS (a * 2) STORED)"
 );
@@ -73,8 +73,8 @@ $node_subscriber->safe_psql('postgres',
 );
 
 # tab7:
-# publisher-side tab7 has generated col 'b' but
-# subscriber-side tab7 do not have col 'b'
+# Publisher-side tab7 

Re: Pgoutput not capturing the generated columns

2024-07-17 Thread Peter Smith
Hi Shubham, here are my review comments for patch v19-0001.

==
src/backend/replication/pgoutput/pgoutput.c

1.
  /*
  * Columns included in the publication, or NULL if all columns are
  * included implicitly.  Note that the attnums in this bitmap are not
+ * publication and include_generated_columns option: other reasons should
+ * be checked at user side.  Note that the attnums in this bitmap are not
  * shifted by FirstLowInvalidHeapAttributeNumber.
  */
  Bitmapset  *columns;
You replied [1] "The attached Patches contain all the suggested
changes." but as I previously commented [2, #1], since there is no
change to the interpretation of the 'columns' BMS caused by this
patch, then I expected this comment would be unchanged (i.e. same as
HEAD code). But this fix was missed in v19-0001.

OTOH, if you do think there was a reason to change the comment then
the above is still not good because "are not publication and
include_generated_columns option" wording doesn't make sense.

==
src/test/subscription/t/011_generated.pl

Observation -- I added (in nitpicks diffs) some more comments for
'tab1' (to make all comments consistent with the new tests added). But
when I was doing that I observed that tab1 and tab3 test scenarios are
very similar. It seems only the subscription parameter is not
specified (so 'include_generated_cols' default wll be tested). IIRC
the default for that parameter is "false", so tab1 is not really
testing that properly -- e.g. I thought maybe to test the default
parameter it's better the subscriber-side 'b' should be not-generated?
But doing that would make 'tab1' the same as 'tab2'. Anyway, something
seems amiss -- it seems either something is not tested or is duplicate
tested. Please revisit what the tab1 test intention was and make sure
we are doing the right thing for it...

==
99.
The attached nitpicks diff patch has some tweaked comments.

==
[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2BR0cj%3Dz1bTMAgQKQWx1EKvkMEnV9QsHGvOqTdnLUQi1A%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAHut%2BPtVfrbx0jb42LCmS%3D-LcMTtWxm%2BvhaoArkjg7Z0mvuXbg%40mail.gmail.com


Kind Regards,
Peter Smith.
Fujitsu Australia.
diff --git a/src/include/catalog/pg_subscription.h 
b/src/include/catalog/pg_subscription.h
index 50c5911..e066426 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -98,8 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) 
BKI_SHARED_RELATION BKI_ROW
 * slots) in 
the upstream database are enabled
 * to be 
synchronized to the standbys. */
 
-   boolsubincludegencols;  /* True if generated columns 
must be
-* 
published */
+   boolsubincludegencols;  /* True if generated columns 
should
+* be 
published */
 
 #ifdef CATALOG_VARLEN  /* variable-length fields start here */
/* Connection string to the publisher */
diff --git a/src/test/subscription/t/011_generated.pl 
b/src/test/subscription/t/011_generated.pl
index fe32987..d13d0a0 100644
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl
@@ -20,24 +20,28 @@ $node_subscriber->start;
 
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
 
+#
+# tab1:
+# Publisher-side tab1 has generated col 'b'.
+# Subscriber-side tab1 has generated col 'b', using a different computation,
+# and also an additional column 'c'.
 $node_publisher->safe_psql('postgres',
"CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 
2) STORED)"
 );
-
 $node_subscriber->safe_psql('postgres',
"CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 
22) STORED, c int)"
 );
 
 # tab2:
-# publisher-side tab2 has generated col 'b'.
-# subscriber-side tab2 has non-generated col 'b'.
+# Publisher-side tab2 has generated col 'b'.
+# Subscriber-side tab2 has non-generated col 'b'.
 $node_publisher->safe_psql('postgres',
"CREATE TABLE tab2 (a int, b int GENERATED ALWAYS AS (a * 2) STORED)");
 $node_subscriber->safe_psql('postgres', "CREATE TABLE tab2 (a int, b int)");
 
 # tab3:
-# publisher-side tab3 has generated col 'b'.
-# subscriber-side tab3 has generated col 'b', using a different computation.
+# Publisher-side tab3 has generated col 'b'.
+# Subscriber-side tab3 has generated col 'b', using a different computation.
 $

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-07-17 Thread Peter Smith
Hi Kuroda-San, here are some review comment for patch v19-1

==
doc/src/sgml/ref/alter_subscription.sgml

The previous patches have common failover/two_phase code checking for
"Do not allow changing the option if the subscription is enabled", but
it seems the docs were mentioning that only for "two_phase" and not
for "failover".

I'm not 100% sure if mentioning about disabled was necessary, but
certainly it should be all-or-nothing, not just saying it for one of
the parameters. Anyway, I chose to add the missing info. Please see
the attached nitpicks diff.

==
Kind Regards,
Peter Smith.
Fujitsu Australia.
diff --git a/doc/src/sgml/ref/alter_subscription.sgml 
b/doc/src/sgml/ref/alter_subscription.sgml
index 58db97f..e0ce83a 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -256,10 +256,15 @@ ALTER SUBSCRIPTION name RENAME TO <
  
 
  
-  The two_phase
-  parameter can only be altered when the subscription is disabled.
-  When altering the parameter from true
-  to false, the backend process checks for any 
incomplete
+  The failover
+  and two_phase
+  parameters can only be altered when the subscription is disabled.
+ 
+
+ 
+  When altering two_phase
+  from true to false,
+  the backend process checks for any incomplete
   prepared transactions done by the logical replication worker (from when
   two_phase parameter was still true)
   and, if any are found, an error is reported. If this happens, you can


  1   2   3   4   5   6   7   8   9   10   >