from:"\"Peter Smith\""

Re: Add contrib/pg_logicalsnapinspect

2024-09-22 Thread Peter Smith

My review comments for v8-0001

==
contrib/pg_logicalinspect/pg_logicalinspect.c

1.
+/*
+ * Lookup table for SnapBuildState.
+ */
+
+#define SNAPBUILD_STATE_INCR 1
+
+static const char *const SnapBuildStateDesc[] = {
+ [SNAPBUILD_START + SNAPBUILD_STATE_INCR] = "start",
+ [SNAPBUILD_BUILDING_SNAPSHOT + SNAPBUILD_STATE_INCR] = "building",
+ [SNAPBUILD_FULL_SNAPSHOT + SNAPBUILD_STATE_INCR] = "full",
+ [SNAPBUILD_CONSISTENT + SNAPBUILD_STATE_INCR] = "consistent",
+};
+
+/*

nit - the SNAPBUILD_STATE_INCR made this code appear more complicated
than it is. Please take a look at the attachment for an alternative
implementation which includes an explanatory comment. YMMV. Feel free
to ignore it.

==
src/include/replication/snapbuild.h

2.
+ * Please keep SnapBuildStateDesc[] (located in the pg_logicalinspect module)
+ * updated should a change needs to be done in SnapBuildState.

nit - "...should a change needs to be done" -- the word "needs" is
incorrect here.

How about:
"...if a change needs to be made to SnapBuildState."
"...if a change is made to SnapBuildState."
"...if SnapBuildState is changed."

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/contrib/pg_logicalinspect/pg_logicalinspect.c 
b/contrib/pg_logicalinspect/pg_logicalinspect.c
index 100b82f..2419df1 100644
--- a/contrib/pg_logicalinspect/pg_logicalinspect.c
+++ b/contrib/pg_logicalinspect/pg_logicalinspect.c
@@ -24,19 +24,6 @@ PG_FUNCTION_INFO_V1(pg_get_logical_snapshot_meta);
 PG_FUNCTION_INFO_V1(pg_get_logical_snapshot_info);
 
 /*
- * Lookup table for SnapBuildState.
- */
-
-#define SNAPBUILD_STATE_INCR 1
-
-static const char *const SnapBuildStateDesc[] = {
-   [SNAPBUILD_START + SNAPBUILD_STATE_INCR] = "start",
-   [SNAPBUILD_BUILDING_SNAPSHOT + SNAPBUILD_STATE_INCR] = "building",
-   [SNAPBUILD_FULL_SNAPSHOT + SNAPBUILD_STATE_INCR] = "full",
-   [SNAPBUILD_CONSISTENT + SNAPBUILD_STATE_INCR] = "consistent",
-};
-
-/*
  * NOTE: For any code change or issue fix here, it is highly recommended to
  * give a thought about doing the same in SnapBuildRestore() as well.
  */
@@ -186,6 +173,16 @@ Datum
 pg_get_logical_snapshot_info(PG_FUNCTION_ARGS)
 {
 #define PG_GET_LOGICAL_SNAPSHOT_INFO_COLS 14
+   /*
+* Lookup table for SnapBuildState. The lookup index is offset by 1
+* because the consecutive SnapBuildState enum values start at -1.
+*/
+   static const char *const SnapBuildStateDesc[] = {
+   [1 + SNAPBUILD_START] = "start",
+   [1 + SNAPBUILD_BUILDING_SNAPSHOT] = "building",
+   [1 + SNAPBUILD_FULL_SNAPSHOT] = "full",
+   [1 + SNAPBUILD_CONSISTENT] = "consistent",
+   };
SnapBuildOnDisk ondisk;
XLogRecPtr  lsn;
HeapTuple   tuple;
@@ -209,8 +206,7 @@ pg_get_logical_snapshot_info(PG_FUNCTION_ARGS)
 
memset(nulls, 0, sizeof(nulls));
 
-   values[i++] = 
CStringGetTextDatum(SnapBuildStateDesc[ondisk.builder.state +
-   
 SNAPBUILD_STATE_INCR]);
+   values[i++] = 
CStringGetTextDatum(SnapBuildStateDesc[ondisk.builder.state + 1]);
values[i++] = TransactionIdGetDatum(ondisk.builder.xmin);
values[i++] = TransactionIdGetDatum(ondisk.builder.xmax);
values[i++] = LSNGetDatum(ondisk.builder.start_decoding_at);
diff --git a/src/include/replication/snapbuild.h 
b/src/include/replication/snapbuild.h
index 78df2d1..e844a89 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -17,7 +17,7 @@
 
 /*
  * Please keep SnapBuildStateDesc[] (located in the pg_logicalinspect module)
- * updated should a change needs to be done in SnapBuildState.
+ * updated if a change needs to be made to SnapBuildState.
  */
 typedef enum
 {

Re: Pgoutput not capturing the generated columns

2024-09-22 Thread Peter Smith

On Fri, Sep 20, 2024 at 2:26 PM Amit Kapila  wrote:
>
> On Fri, Sep 20, 2024 at 4:16 AM Peter Smith  wrote:
> >
> > On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > Users can use a publication like "create publication pub1 for table
> > > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > > but not for t2. They can specify the generated column name in the
> > > > column list of t1 in that case even though the rest of the tables
> > > > won't publish generated columns.
> > >
> > > Agreed.
> > >
> > > I think that users can use the publish_generated_column option when
> > > they want to publish all generated columns, instead of specifying all
> > > the columns in the column list. It's another advantage of this option
> > > that it will also include the future generated columns.
> > >
> >
> > OK. Let me give some examples below to help understand this idea.
> >
> > Please correct me if these are incorrect.
> >
> > Examples, when publish_generated_columns=true:
> >
> > CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> > (publish_generated_columns=true)
> > t1 -> publishes a, b, gen2 (e.g. what column list says)
> > t2 -> publishes c, d + ALSO gen1, gen2
> >
> > CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH 
> > (publish_generated_columns=true)
> > t1 -> publishes a, b + ALSO gen1, gen2
> > t2 -> publishes gen1 (e.g. what column list says)
> >
>
> These two could be controversial because one could expect that if
> "publish_generated_columns=true" then publish generated columns
> irrespective of whether they are mentioned in column_list. I am of the
> opinion that column_list should take priority the results should be as
> mentioned by you but let us see if anyone thinks otherwise.
>
> >
> > ==
> >
> > The idea LGTM, although now the parameter name
> > ('publish_generated_columns') seems a bit misleading since sometimes
> > generated columns get published "irrespective of the option".
> >
> > So, I think the original parameter name 'include_generated_columns'
> > might be better here because IMO "include" seems more like "add them
> > if they are not already specified", which is exactly what this idea is
> > doing.
> >
>
> I still prefer 'publish_generated_columns' because it matches with
> other publication option names. One can also deduce from
> 'include_generated_columns' that add all the generated columns even
> when some of them are specified in column_list.
>

Fair point. Anyway, to avoid surprises it will be important for the
precedence rules to be documented clearly (probably with some
examples),

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Pgoutput not capturing the generated columns

2024-09-19 Thread Peter Smith

On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada  wrote:
>
> On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila  wrote:
> >
...
> > I think that the column list should take priority and we should
> > publish the generated column if it is mentioned in  irrespective of
> > the option.
>
> Agreed.
>
> >
...
> >
> > Users can use a publication like "create publication pub1 for table
> > t1(c1, c2), t2;" where they want t1's generated column to be published
> > but not for t2. They can specify the generated column name in the
> > column list of t1 in that case even though the rest of the tables
> > won't publish generated columns.
>
> Agreed.
>
> I think that users can use the publish_generated_column option when
> they want to publish all generated columns, instead of specifying all
> the columns in the column list. It's another advantage of this option
> that it will also include the future generated columns.
>

OK. Let me give some examples below to help understand this idea.

Please correct me if these are incorrect.

==

Assuming these tables:

t1(a,b,gen1,gen2)
t2(c,d,gen1,gen2)

Examples, when publish_generated_columns=false:

CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
(publish_generated_columns=false)
t1 -> publishes a, b, gen2 (e.g. what column list says)
t2 -> publishes c, d

CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes gen1 (e.g. what column list says)

CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes c, d

CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
t1 -> publishes a, b
t2 -> publishes c, d

~~

Examples, when publish_generated_columns=true:

CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
(publish_generated_columns=true)
t1 -> publishes a, b, gen2 (e.g. what column list says)
t2 -> publishes c, d + ALSO gen1, gen2

CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes gen1 (e.g. what column list says)

CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes c, d + ALSO gen1, gen2

CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
t1 -> publishes a, b + ALSO gen1, gen2
t2 -> publishes c, d + ALSO gen1, gen2

==

The idea LGTM, although now the parameter name
('publish_generated_columns') seems a bit misleading since sometimes
generated columns get published "irrespective of the option".

So, I think the original parameter name 'include_generated_columns'
might be better here because IMO "include" seems more like "add them
if they are not already specified", which is exactly what this idea is
doing.

Thoughts?

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Add contrib/pg_logicalsnapinspect

2024-09-18 Thread Peter Smith

Thanks for the updated patch.

Here are a few more trivial comments for the patch v7-0001.

==

1.
Should the extension descriptions all be identical?

I noticed small variations:

contrib/pg_logicalinspect/Makefile
+PGFILEDESC = "pg_logicalinspect - functions to inspect logical
decoding components"

contrib/pg_logicalinspect/meson.build
+'--FILEDESC', 'pg_logicalinspect - functions to inspect contents
of logical snapshots',])

contrib/pg_logicalinspect/pg_logicalinspect.control
+comment = 'functions to inspect logical decoding components'

==
.../expected/logical_inspect.out

2
+step s1_get_logical_snapshot_info: SELECT
(pg_get_logical_snapshot_info(f.name::pg_lsn)).state,(pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_xip,1),(pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_xip,1)
FROM (SELECT replace(replace(name,'.snap',''),'-','/') AS name FROM
pg_ls_logicalsnapdir()) AS f ORDER BY 2;
+state|catchange_count|array_length|committed_count|array_length
+-+---++---+
+2|  0||  2|   2
+2|  2|   2|  0|
+(2 rows)
+

2a.
Would it be better to rearrange those columns so 'committed' stuff
comes before 'catchange' stuff, to make this table order consistent
with the structure/code?

~

2b.
Maybe those 2 'array_length' columns could have aliases to uniquely
identify them?
e.g. 'catchange_array_length' and 'committed_array_length'.

==
contrib/pg_logicalinspect/pg_logicalinspect.c

3.
+/*
+ * Validate the logical snapshot file.
+ */
+static void
+ValidateAndRestoreSnapshotFile(XLogRecPtr lsn, SnapBuildOnDisk *ondisk,
+const char *path)

Since the name was updated then should the function comment also be
updated to include something like the SnapBuildRestoreContents
function comment? e.g. "Validate the logical snapshot file, and read
the contents of the serialized snapshot to 'ondisk'."

~~~

pg_get_logical_snapshot_info:

4.
nit - Add/remove some blank lines to help visually associate the array
counts with their arrays.

==
.../specs/logical_inspect.spec

5.
+setup
+{
+DROP TABLE IF EXISTS tbl1;
+CREATE TABLE tbl1 (val1 integer, val2 integer);
+ CREATE EXTENSION pg_logicalinspect;
+}
+
+teardown
+{
+DROP TABLE tbl1;
+SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+ DROP EXTENSION pg_logicalinspect;
+}

Different indentation for the CREATE/DROP EXTENSION?

==

The attached file shows the whitespace nit (#4)

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/contrib/pg_logicalinspect/pg_logicalinspect.c 
b/contrib/pg_logicalinspect/pg_logicalinspect.c
index 185f36a..308c653 100644
--- a/contrib/pg_logicalinspect/pg_logicalinspect.c
+++ b/contrib/pg_logicalinspect/pg_logicalinspect.c
@@ -205,8 +205,8 @@ pg_get_logical_snapshot_info(PG_FUNCTION_ARGS)
values[i++] = BoolGetDatum(ondisk.builder.in_slot_creation);
values[i++] = LSNGetDatum(ondisk.builder.last_serialized_snapshot);
values[i++] = TransactionIdGetDatum(ondisk.builder.next_phase_at);
-   values[i++] = Int64GetDatum(ondisk.builder.committed.xcnt);
 
+   values[i++] = Int64GetDatum(ondisk.builder.committed.xcnt);
if (ondisk.builder.committed.xcnt > 0)
{
Datum  *arrayelems;
@@ -223,7 +223,6 @@ pg_get_logical_snapshot_info(PG_FUNCTION_ARGS)
nulls[i++] = true;
 
values[i++] = Int64GetDatum(ondisk.builder.catchange.xcnt);
-
if (ondisk.builder.catchange.xcnt > 0)
{
Datum  *arrayelems;

Re: Add contrib/pg_logicalsnapinspect

2024-09-18 Thread Peter Smith

HI, here are some mostly minor review comments for the patch v5-0001.

==
Commit message

1.
Do you think you should also name the new functions here?

==
contrib/pg_logicalinspect/pg_logicalinspect.c

2.
Regarding the question about static function declarations:

Shveta wrote: I was somehow under the impression that this is the way
in the postgres i.e. not add redundant declarations.  Will be good to
know what others think on this.

FWIW, my understanding is the convention is just to be consistent with
whatever the module currently does. If it declares static functions,
then declare them all (redundant or not). If it doesn't declare static
functions, then don't add one. But, in the current case, since this is
a new module, I guess it is entirely up to you whatever you want to
do.

~~~

3.
+/*
+ * NOTE: For any code change or issue fix here, it is highly recommended to
+ * give a thought about doing the same in SnapBuildRestore() as well.
+ */
+

nit - I think this NOTE should be part of this module's header
comment. (e.g. like the tablesync.c NOTES)

~~~

ValidateSnapshotFile:

4.
+ValidateSnapshotFile(XLogRecPtr lsn, SnapBuildOnDisk *ondisk, const char *path)
+{
+ int fd;
+ Size sz;

nit - The 'sz' is overwritten a few times. I thnk declaring it at each
scope where used would be tidier.

~~~

5.
+ fsync_fname(path, false);
+ fsync_fname(PG_LOGICAL_SNAPSHOTS_DIR, true);
+
+

nit - remove some excessive blank lines

~~~

6.
+ /* read statically sized portion of snapshot */
+ SnapBuildRestoreContents(fd, (char *) ondisk,
SnapBuildOnDiskConstantSize, path);

Should that say "fixed size portion"?

~~~

pg_get_logical_snapshot_info:

7.
+ if (ondisk.builder.committed.xcnt > 0)
+ {
+ Datum*arrayelems;
+ int narrayelems;
+
+ arrayelems = (Datum *) palloc(ondisk.builder.committed.xcnt * sizeof(Datum));
+ narrayelems = 0;
+
+ for (narrayelems = 0; narrayelems < ondisk.builder.committed.xcnt;
narrayelems++)
+ arrayelems[narrayelems] = Int64GetDatum((int64)
ondisk.builder.committed.xip[narrayelems]);

nit - Why the double assignment of narrayelems = 0? It is simpler to
assign at the declaration and then remove both others.

~~~

8.
+ if (ondisk.builder.catchange.xcnt > 0)
+ {
+ Datum*arrayelems;
+ int narrayelems;
+
+ arrayelems = (Datum *) palloc(ondisk.builder.catchange.xcnt * sizeof(Datum));
+ narrayelems = 0;
+
+ for (narrayelems = 0; narrayelems < ondisk.builder.catchange.xcnt;
narrayelems++)
+ arrayelems[narrayelems] = Int64GetDatum((int64)
ondisk.builder.catchange.xip[narrayelems]);

nit - ditto previous comment

==
doc/src/sgml/pglogicalinspect.sgml

9.
+ 
+  The pg_logicalinspect module provides SQL functions
+  that allow you to inspect the contents of logical decoding components. It
+  allows to inspect serialized logical snapshots of a running
+  PostgreSQL database cluster, which is useful
+  for debugging or educational purposes.
+ 

nit - /It allows to inspect/It allows the inspection of/

~~~

10.
+  example:

nit - /example:/For example:/ (this is in a couple of places)

==
src/include/replication/snapbuild_internal.h

11.
+#ifndef INTERNAL_SNAPBUILD_H
+#define INTERNAL_SNAPBUILD_H

Shouldn't these be SNAPBUILD_INTERNAL_H to match the filename?

~~~

12.
The contents of the snapbuild.c that got moved into
snapbuild_internal.h also got shuffled around a bit.

e.g. originally the typedef struct SnapBuildOnDisk:

+/*
+ * We store current state of struct SnapBuild on disk in the following manner:
+ *
+ * struct SnapBuildOnDisk;
+ * TransactionId * committed.xcnt; (*not xcnt_space*)
+ * TransactionId * catchange.xcnt;
+ *
+ */
+typedef struct SnapBuildOnDisk

was directly beneath the comment:
-/* ---
- * Snapshot serialization support
- * ---
- */
-

The macros were also defined immediately after the SnapBuildOnDisk
fields they referred to.

Wasn't that original ordering better than how it is now ordered in
snapshot_internal.h?

==

Please also see the attachment, which implements some of those nits
mentioned above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/contrib/pg_logicalinspect/pg_logicalinspect.c 
b/contrib/pg_logicalinspect/pg_logicalinspect.c
index dc9041a..2111202 100644
--- a/contrib/pg_logicalinspect/pg_logicalinspect.c
+++ b/contrib/pg_logicalinspect/pg_logicalinspect.c
@@ -1,13 +1,17 @@
 /*-
  *
  * pg_logicalinspect.c
- *   Functions to inspect contents of PostgreSQL logical snapshots
+ * Functions to inspect contents of PostgreSQL logical snapshots
  *
  * Copyright (c) 2024, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
- *   contrib/pg_logicalinspect/pg_logicalinspect.c
+ * contrib/pg_logicalinspect/pg_logicalinspect.c
  *
+ *
+ * NOTES
+ * For any co

Re: Pgoutput not capturing the generated columns

2024-09-17 Thread Peter Smith

Hi, here are my review comments for patch v31-0002.

==

1. General.

IMO patches 0001 and 0002 should be merged when next posted. IIUC the
reason for the split was only because there were 2 different authors
but that seems to be not relevant anymore.

==
Commit message

2.
When 'copy_data' is true, during the initial sync, the data is replicated from
the publisher to the subscriber using the COPY command. The normal COPY
command does not copy generated columns, so when 'publish_generated_columns'
is true, we need to copy using the syntax:
'COPY (SELECT column_name FROM table_name) TO STDOUT'.

~

2a.
Should clarify that 'copy_data' is a SUBSCRIPTION parameter.

2b.
Should clarify that 'publish_generated_columns' is a PUBLICATION parameter.

==
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

3.
- for (i = 0; i < rel->remoterel.natts; i++)
+ desc = RelationGetDescr(rel->localrel);
+ localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));

Each time I review this code I am tricked into thinking it is wrong to
use rel->remoterel.natts here for the localgenlist. AFAICT the code is
actually fine because you do not store *all* the subscriber gencols in
'localgenlist' -- you only store those with matching names on the
publisher table. It might be good if you could add an explanatory
comment about that to prevent any future doubts.

~~~

4.
+ if (!remotegenlist[remote_attnum])
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication target relation \"%s.%s\" has a
generated column \"%s\" "
+ "but corresponding column on source relation is not a generated column",
+ rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname;

This error message has lots of good information. OTOH, I think when
copy_data=false the error would report the subscriber column just as
"missing", which is maybe less helpful. Perhaps that other
copy_data=false "missing" case can be improved to share the same error
message that you have here.

~~~

fetch_remote_table_info:

5.
IIUC, this logic needs to be more sophisticated to handle the case
that was being discussed earlier with Sawada-san [1]. e.g. when the
same table has gencols but there are multiple subscribed publications
where the 'publish_generated_columns' parameter differs.

Also, you'll need test cases for this scenario, because it is too
difficult to judge correctness just by visual inspection of the code.



6.
nit - Change 'hasgencolpub' to 'has_pub_with_pubgencols' for
readability, and initialize it to 'false' to make it easy to use
later.

~~~

7.
- * Get column lists for each relation.
+ * Get column lists for each relation and check if any of the publication
+ * has generated column option.

and

+ /* Check if any of the publication has generated column option */
+ if (server_version >= 18)

nit - tweak the comments to name the publication parameter properly.

~~~

8.
foreach(lc, MySubscription->publications)
{
if (foreach_current_index(lc) > 0)
appendStringInfoString(&pub_names, ", ");
appendStringInfoString(&pub_names, quote_literal_cstr(strVal(lfirst(lc;
}

I know this is existing code, but shouldn't all this be done by using
the purpose-built function 'get_publications_str'

~~~

9.
+ ereport(ERROR,
+ errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("could not fetch gencolumns information from publication list: %s",
+pub_names.data));

and

+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("failed to fetch tuple for gencols from publication list: %s",
+pub_names.data));

nit - /gencolumns information/generated column publication
information/ to make the errmsg more human-readable

~~~

10.
+ bool gencols_allowed = server_version >= 18 && hasgencolpub;
+
+ if (!gencols_allowed)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");

Can the 'gencols_allowed' var be removed, and the condition just be
replaced with if (!has_pub_with_pubgencols)? It seems equivalent
unless I am mistaken.

==

Please refer to the attachment which implements some of the nits
mentioned above.

==
[1] 
https://www.postgresql.org/message-id/CAD21AoBun9crSWaxteMqyu8A_zme2ppa2uJvLJSJC2E3DJxQVA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 723c44c..6d17984 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -850,7 +850,7 @@ fetch_remote_table_info(char *nspname, char *relname, bool 
**remotegenlist_res,
Oid qualRow[] = {TEXTOID};
boolisnull;
bool

Re: Pgoutput not capturing the generated columns

2024-09-17 Thread Peter Smith

Review comments for v31-0001.

(I tried to give only new comments, but there might be some overlap
with comments I previously made for v30-0001)

==
src/backend/catalog/pg_publication.c

1.
+
+ if (publish_generated_columns_given)
+ {
+ values[Anum_pg_publication_pubgencolumns - 1] =
BoolGetDatum(publish_generated_columns);
+ replaces[Anum_pg_publication_pubgencolumns - 1] = true;
+ }

nit - unnecessary whitespace above here.

==
src/backend/replication/pgoutput/pgoutput.c

2. prepare_all_columns_bms

+ /* Iterate the cols until generated columns are found. */
+ cols = bms_add_member(cols, i + 1);

How does the comment relate to the statement that follows it?

~~~

3.
+ * Skip generated column if pubgencolumns option was not
+ * specified.

nit - /pubgencolumns option/publish_generated_columns parameter/

==
src/bin/pg_dump/pg_dump.c

4.
getPublications:

nit - /i_pub_gencolumns/i_pubgencols/ (it's the same information but simpler)

==
src/bin/pg_dump/pg_dump.h

5.
+ bool pubgencolumns;
 } PublicationInfo;

nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)

==
vsrc/bin/psql/describe.c

6.
  bool has_pubviaroot;
+ bool has_pubgencol;

nit - /has_pubgencol/has_pubgencols/ (plural consistency)

==
src/include/catalog/pg_publication.h

7.
+ /* true if generated columns data should be published */
+ bool pubgencolumns;
 } FormData_pg_publication;

nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)

~~~

8.
+ bool pubgencolumns;
  PublicationActions pubactions;
 } Publication;

nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)

==
src/test/regress/sql/publication.sql

9.
+-- Test the publication with or without 'PUBLISH_GENERATED_COLUMNS' parameter
+SET client_min_messages = 'ERROR';
+CREATE PUBLICATION pub1 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=1);
+\dRp+ pub1
+
+CREATE PUBLICATION pub2 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=0);
+\dRp+ pub2

9a.
nit - Use lowercase for the parameters.

~

9b.
nit - Fix the comment to say what the test is actually doing:
"Test the publication 'publish_generated_columns' parameter enabled or disabled"

==
src/test/subscription/t/031_column_list.pl

10.
Later I think you should add another test here to cover the scenario
that I was discussing with Sawada-San -- e.g. when there are 2
publications for the same table subscribed by just 1 subscription but
having different values of the 'publish_generated_columns' for the
publications.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2e7804e..cca54bc 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -515,8 +515,8 @@ CREATE TABLE people (
 
  
   Generated columns may be skipped during logical replication according to 
the
-  CREATE PUBLICATION option
-  
+  CREATE PUBLICATION parameter
+  
   publish_generated_columns.
  
 
diff --git a/doc/src/sgml/ref/create_publication.sgml 
b/doc/src/sgml/ref/create_publication.sgml
index e133dc3..cd20bd4 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -223,7 +223,7 @@ CREATE PUBLICATION name
 

 
-   
+   
 publish_generated_columns 
(boolean)
 
  
@@ -231,14 +231,6 @@ CREATE PUBLICATION name
   associated with the publication should be replicated.
   The default is false.
  
- 
-  This option is only available for replicating generated column data 
from the publisher
-  to a regular, non-generated column in the subscriber.
- 
- 
- This parameter can only be set true if 
copy_data is
- set to false.
- 
 

 
diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index 272b6a1..7ebb851 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -999,7 +999,7 @@ GetPublication(Oid pubid)
pub->pubactions.pubdelete = pubform->pubdelete;
pub->pubactions.pubtruncate = pubform->pubtruncate;
pub->pubviaroot = pubform->pubviaroot;
-   pub->pubgencolumns = pubform->pubgencolumns;
+   pub->pubgencols = pubform->pubgencols;
 
ReleaseSysCache(tup);
 
@@ -1211,7 +1211,7 @@ pg_get_publication_tables(PG_FUNCTION_ARGS)
if (att->attisdropped)
continue;
 
-   if (att->attgenerated && !pub->pubgencolumns)
+   if (att->attgenerated && !pub->pubgencols)
continue;
 
attnums[nattnums++] = att->attnum;
diff --git a/src/backend/commands/public

Re: Pgoutput not capturing the generated columns

2024-09-17 Thread Peter Smith

Here are some review comments for v31-0001 (for the docs only)

There may be some overlap here with some comments already made for
v30-0001 which are not yet addressed in v31-0001.

==
Commit message

1.
When introducing the 'publish_generated_columns' parameter, you must
also say this is a PUBLICATION parameter.

~~~

2.
With this enhancement, users can now include the 'include_generated_columns'
option when querying logical replication slots using either the pgoutput
plugin or the test_decoding plugin. This option, when set to 'true' or '1',
instructs the replication system to include generated column information
and data in the replication stream.

~

The above is stale information because it still refers to the old name
'include_generated_columns', and to test_decoding which was already
removed in this patch.

==
doc/src/sgml/ddl.sgml

3.
+  Generated columns may be skipped during logical replication
according to the
+  CREATE PUBLICATION option
+  
+  publish_generated_columns.

3a.
nit - The linkend is based on the old name instead of the new name.

3b.
nit - Better to call this a parameter instead of an option because
that is what the CREATE PUBLICATION docs call it.

==
doc/src/sgml/protocol.sgml

4.
+
+ publish_generated_columns
+  
+   
+Boolean option to enable generated columns. This option controls
+whether generated columns should be included in the string
+representation of tuples during logical decoding in PostgreSQL.
+   
+  
+
+

Is this even needed anymore? Now that the implementation is using a
PUBLICATION parameter, isn't everything determined just by that
parameter? I don't see the reason why a protocol change is needed
anymore. And, if there is no protocol change needed, then this
documentation change is also not needed.



5.
  
-  Next, the following message part appears for each column included in
-  the publication (except generated columns):
+  Next, the following message parts appear for each column included in
+  the publication (generated columns are excluded unless the parameter
+  
+  publish_generated_columns specifies otherwise):
  

Like the previous comment above, I think everything is now determined
by the PUBLICATION parameter. So maybe this should just be referring
to that instead.

==
doc/src/sgml/ref/create_publication.sgml

6.
+   
+publish_generated_columns
(boolean)
+

nit - the ID is based on the old parameter name.

~

7.
+ 
+  This option is only available for replicating generated
column data from the publisher
+  to a regular, non-generated column in the subscriber.
+ 

IMO remove this paragraph. I really don't think you should be
mentioning the subscriber here at all. AFAIK this parameter is only
for determining if the generated column will be published or not. What
happens at the other end (e.g. logic whether it gets ignored or not by
the subscriber) is more like a matrix of behaviours that could be
documented in the "Logical Replication" section. But not here.

(I removed this in my nitpicks attachment)

~~~

8.
+ 
+ This parameter can only be set true if
copy_data is
+ set to false.
+ 

IMO remove this paragraph too. The user can create a PUBLICATION
before a SUBSCRIPTION even exists so to say it "can only be set..." is
not correct. Sure, your patch 0001 does not support the COPY of
generated columns but if you want to document that then it should be
documented in the CREATE SUBSCRIBER docs. But not here.

(I removed this in my nitpicks attachment)

TBH, it would be better if patches 0001 and 0002 were merged then you
can avoid all this. IIUC they were only separate in the first place
because 2 different people wrote them. It is not making reviews easier
with them split.

==

Please see the attachment which implements some of the nits above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2e7804e..cca54bc 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -515,8 +515,8 @@ CREATE TABLE people (
 
  
   Generated columns may be skipped during logical replication according to 
the
-  CREATE PUBLICATION option
-  
+  CREATE PUBLICATION parameter
+  
   publish_generated_columns.
  
 
diff --git a/doc/src/sgml/ref/create_publication.sgml 
b/doc/src/sgml/ref/create_publication.sgml
index e133dc3..cd20bd4 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -223,7 +223,7 @@ CREATE PUBLICATION name
 

 
-   
+   
 publish_generated_columns 
(boolean)
 
  
@@ -231,14 +231,6 @@ CREATE PUBLICATION name
   associated with the publication should

Re: Pgoutput not capturing the generated columns

2024-09-16 Thread Peter Smith

On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada  wrote:
>
> On Mon, Sep 16, 2024 at 8:09 PM Peter Smith  wrote:
> >
> > I thought that the option "publish_generated_columns" is more related
> > to "column lists" than "row filters".
> >
> > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
> >
>
> > And
> > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > is equivalent to
> > PUBLICATION pub2 FOR TABLE t1(a,b,c);
>
> This makes sense to me as it preserves the current behavior.
>
> > Then:
> > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > is equivalent to
> > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);
>
> This also makes sense. It would also include future generated columns.
>
> > So, I would expect this to fail because the SUBSCRIPTION docs say
> > "Subscriptions having several publications in which the same table has
> > been published with different column lists are not supported."
>
> So I agree that it would raise an error if users subscribe to both
> pub1 and pub2.
>
> And looking back at your examples,
>
> > > > e.g.1
> > > > -
> > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = 
> > > > true);
> > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = 
> > > > false);
> > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > -
> > > >
> > > > e.g.2
> > > > -
> > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns 
> > > > = true);
> > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = 
> > > > false);
> > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > -
>
> Both examples would not be supported.
>
> > > >
> > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > > several publications in which the same table has been published with
> > > > different column lists are not supported."
> > > >
> > > > Perhaps the user is supposed to deduce that the example above would
> > > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > > generated cols then the PUBLICATION column lists must be different and
> > > > therefore it is "not supported" (??).
> > >
> > > With the patch, how should this feature work when users specify a
> > > generated column to the column list and set publish_generated_column =
> > > false, in the first place? raise an error (as we do today)? or always
> > > send NULL?
> >
> > For this scenario, I suggested (see [1] #3) that the code could give a
> > WARNING. As I wrote up-thread: This combination doesn't seem
> > like something a user would do intentionally, so just silently
> > ignoring it (which the current patch does) is likely going to give
> > someone unexpected results/grief.
>
> It gives a WARNING, and then publishes the specified generated column
> data (even if publish_generated_column = false)? If so, it would mean
> that specifying the generated column to the column list means to
> publish its data regardless of the publish_generated_column parameter
> value.
>

No. I meant only it can give the WARNING to tell the user user  "Hey,
there is a conflict here because you said publish_generated_column=
false, but you also specified gencols in the column list".

But always it is the option "publish_generated_column" determines the
final publishing behaviour. So if it says
publish_generated_column=false then it would NOT publish generated
columns even if they are gencols in the column list. I think this
makes sense because when there is no column list specified then that
implicitly means "all columns" and the table might have some gencols,
but still 'publish_generated_columns' is what determines the
behaviour.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Pgoutput not capturing the generated columns

2024-09-16 Thread Peter Smith

On Tue, Sep 17, 2024 at 7:02 AM Masahiko Sawada  wrote:
>
> On Wed, Sep 11, 2024 at 10:30 PM Peter Smith  wrote:
> >
> > Because this feature is now being implemented as a PUBLICATION option,
> > there is another scenario that might need consideration; I am thinking
> > about where the same table is published by multiple PUBLICATIONS (with
> > different option settings) that are subscribed by a single
> > SUBSCRIPTION.
> >
> > e.g.1
> > -
> > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = 
> > true);
> > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = 
> > false);
> > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > -
> >
> > e.g.2
> > -
> > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = 
> > true);
> > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = 
> > false);
> > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > -
> >
> > Do you know if this case is supported? If yes, then which publication
> > option value wins?
>
> I would expect these option values are processed with OR. That is, we
> publish changes of the generated columns if at least one publication
> sets publish_generated_columns to true. It seems to me that we treat
> multiple row filters in the same way.
>

I thought that the option "publish_generated_columns" is more related
to "column lists" than "row filters".

Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.

Then:
PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
is equivalent to
PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);

And
PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
is equivalent to
PUBLICATION pub2 FOR TABLE t1(a,b,c);

So, I would expect this to fail because the SUBSCRIPTION docs say
"Subscriptions having several publications in which the same table has
been published with different column lists are not supported."

~~

Here's another example:
PUBLICATION pub3 FOR TABLE t1(a,b);
PUBLICATION pub4 FOR TABLE t1(c);

Won't it be strange (e.g. difficult to explain) why pub1 and pub2
table column lists are allowed to be combined in one subscription, but
pub3 and pub4 in one subscription are not supported due to the
different column lists?

> >
> > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > several publications in which the same table has been published with
> > different column lists are not supported."
> >
> > Perhaps the user is supposed to deduce that the example above would
> > work OK if table 't1' has no generated cols. OTOH, if it did have
> > generated cols then the PUBLICATION column lists must be different and
> > therefore it is "not supported" (??).
>
> With the patch, how should this feature work when users specify a
> generated column to the column list and set publish_generated_column =
> false, in the first place? raise an error (as we do today)? or always
> send NULL?

For this scenario, I suggested (see [1] #3) that the code could give a
WARNING. As I wrote up-thread: This combination doesn't seem
like something a user would do intentionally, so just silently
ignoring it (which the current patch does) is likely going to give
someone unexpected results/grief.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com

Kind Regards,
Peter Smith
Fujitsu Australia

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-16 Thread Peter Smith

Here are a few comments for the patch v46-0001.

==
src/backend/replication/slot.c

1. ReportSlotInvalidation

On Mon, Sep 16, 2024 at 8:01 PM Bharath Rupireddy
 wrote:
>
> On Mon, Sep 9, 2024 at 1:11 PM Peter Smith  wrote:
> > 3. ReportSlotInvalidation
> >
> > I didn't understand why there was a hint for:
> > "You might need to increase \"%s\".", "max_slot_wal_keep_size"
> >
> > Why aren't these similar cases consistent?
>
> It looks misleading and not very useful. What happens if the removed
> WAL (that's needed for the slot) is put back into pg_wal somehow (by
> manually copying from archive or by some tool/script)? Can the slot
> invalidated due to wal_removed start sending WAL to its clients?
>
> > But you don't have an equivalent hint for timeout invalidation:
> > "You might need to increase \"%s\".", "replication_slot_inactive_timeout"
>
> I removed this per review comments upthread.

IIUC the errors are quite similar, so my previous review comment was
mostly about the unexpected inconsistency of why one of them has a
hint and the other one does not. I don't have a strong opinion about
whether they should both *have* or *not have* hints, so long as they
are treated the same.

If you think the current code hint is not useful then maybe we need a
new thread to address that existing issue. For example, maybe it
should be removed or reworded.

~~~

2. InvalidatePossiblyObsoleteSlot:

+ case RS_INVAL_INACTIVE_TIMEOUT:
+
+ if (!SlotInactiveTimeoutCheckAllowed(s))
+ break;
+
+ /*
+ * Check if the slot needs to be invalidated due to
+ * replication_slot_inactive_timeout GUC.
+ */
+ if (TimestampDifferenceExceeds(s->inactive_since, now,
+replication_slot_inactive_timeout * 1000))

nit - it might be tidier to avoid multiple breaks by just combining
these conditions. See the nitpick attachment.

~~~

3.
  * - RS_INVAL_WAL_LEVEL: is logical
+ * - RS_INVAL_INACTIVE_TIMEOUT: inactive timeout occurs

nit - use comment wording "inactive slot timeout has occurred", to
make it identical to the comment in slot.h

==
src/test/recovery/t/050_invalidate_slots.pl

4.
+# Despite inactive timeout being set, the synced slot won't get invalidated on
+# its own on the standby. So, we must not see invalidation message in server
+# log.
+$standby1->safe_psql('postgres', "CHECKPOINT");
+ok( !$standby1->log_contains(
+ "invalidating obsolete replication slot \"sync_slot1\"",
+ $logstart),
+ 'check that synced slot sync_slot1 has not been invalidated on standby'
+);
+

It seems kind of brittle to check the logs for something that is NOT
there because any change to the message will make this accidentally
pass. Apart from that, it might anyway be more efficient just to check
the pg_replication_slots again to make sure the 'invalidation_reason
remains' still NULL.

==

Please see the attachment which implements some of the nit changes
mentioned above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 851120e..0076e4b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1716,15 +1716,12 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
invalidation_cause = cause;
break;
case RS_INVAL_INACTIVE_TIMEOUT:
-
-   if (!SlotInactiveTimeoutCheckAllowed(s))
-   break;
-
/*
 * Check if the slot needs to be 
invalidated due to
 * replication_slot_inactive_timeout 
GUC.
 */
-   if 
(TimestampDifferenceExceeds(s->inactive_since, now,
+   if (SlotInactiveTimeoutCheckAllowed(s) 
&&
+   
TimestampDifferenceExceeds(s->inactive_since, now,

   replication_slot_inactive_timeout * 1000))
{
invalidation_cause = cause;
@@ -1894,7 +1891,7 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
  * - RS_INVAL_HORIZON: requires a snapshot <= the given horizon in the given
  *   db; dboid may be InvalidOid for shared relations
  * - RS_INVAL_WAL_LEVEL: is logical
- * - RS_INVAL_INACTIVE_TIMEOUT: inactive timeout occurs
+ * - RS_INVAL_INACTIVE_TIMEOUT: inactive slot time

Re: Pgoutput not capturing the generated columns

2024-09-16 Thread Peter Smith

gt; > >
> > > > > I think we can create a publication for a single table, so what we can
> > > > > do with this feature can be done also by the idea you described below.
> > > > >
> > > > > > Yet another idea is to keep this as a publication option
> > > > > > (include_generated_columns or publish_generated_columns) similar to
> > > > > > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > > > > > is used when tables on either side have different partitions
> > > > > > hierarchies which is somewhat the case here.
> > > > >
> > > > > It sounds more useful to me.
> > > > >
> > > >
> > > > Fair enough. Let's see if anyone else has any preference among the
> > > > proposed methods or can think of a better way.
> > >
> > > I have fixed the current issue. I have added the option
> > > 'publish_generated_columns' to the publisher side and created the new
> > > test cases accordingly.
> > > The attached patches contain the desired changes.
> > >
> >
> > Thank you for updating the patches. I have some comments:
> >
> > Do we really need to add this option to test_decoding? I think it
> > would be good if this improves the test coverage. Otherwise, I'm not
> > sure we need this part. If we want to add it, I think it would be
> > better to have it in a separate patch.
> >
>
> I have removed the option from the test_decoding file.
>
> > ---
> > + 
> > +  If the publisher-side column is also a generated column
> > then this option
> > +  has no effect; the publisher column will be filled as normal 
> > with the
> > +  publisher-side computed or default data.
> > + 
> >
> > I don't understand this description. Why does this option have no
> > effect if the publisher-side column is a generated column?
> >
>
> The documentation was incorrect. Currently, replicating from a
> publisher table with a generated column to a subscriber table with a
> generated column will result in an error. This has now been updated.
>
> > ---
> > + 
> > + This parameter can only be set true if
> > copy_data is
> > + set to false.
> > + 
> >
> > If I understand this patch correctly, it doesn't disallow to set
> > copy_data to true when the publish_generated_columns option is
> > specified. But do we want to disallow it? I think it would be more
> > useful and understandable if we allow to use both
> > publish_generated_columns (publisher option) and copy_data (subscriber
> > option) at the same time.
> >
>
> Support for tablesync with generated columns was not included in the
> initial patch, and this was reflected in the documentation. The
> functionality for syncing generated column data has been introduced
> with the 0002 patch.
>

Since nothing was said otherwise, I assumed my v30-0001 comments were
addressed in v31, but the new code seems to have quite a few of my
suggested changes missing. If you haven't addressed my review comments
for patch 0001 yet, please say so. OTOH, please give reasons for any
rejected comments.

> The attached v31 patches contain the changes for the same. I won't be
> posting the test patch for now. I will share it once this patch has
> been stabilized.

How can the patch become "stabilized" without associated tests to
verify the behaviour is not broken? e.g. I can write a stable function
that says 2+2=5.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Pgoutput not capturing the generated columns

2024-09-11 Thread Peter Smith

Because this feature is now being implemented as a PUBLICATION option,
there is another scenario that might need consideration; I am thinking
about where the same table is published by multiple PUBLICATIONS (with
different option settings) that are subscribed by a single
SUBSCRIPTION.

e.g.1
-
CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
-

e.g.2
-
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
-

Do you know if this case is supported? If yes, then which publication
option value wins?

The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
several publications in which the same table has been published with
different column lists are not supported."

Perhaps the user is supposed to deduce that the example above would
work OK if table 't1' has no generated cols. OTOH, if it did have
generated cols then the PUBLICATION column lists must be different and
therefore it is "not supported" (??).

I have not tried this to see what happens, but even if it behaves as
expected, there should probably be some comments/docs/tests for this
scenario to clarify it for the user.

Notice that "publish_via_partition_root" has a similar conundrum, but
in that case, the behaviour is documented in the CREATE PUBLICATION
docs [2]. So, maybe  "publish_generated_columns" should be documented
a bit like that.

==
[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html
[2] https://www.postgresql.org/docs/devel/sql-createpublication.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Remove shadowed declaration warnings

2024-09-11 Thread Peter Smith

o’ shadows
a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:841:46: warning: declaration of ‘dbinfo’ shadows
a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:961:47: warning: declaration of ‘dbinfo’ shadows
a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1104:41: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1142:41: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1182:45: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1242:54: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1272:56: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1314:70: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1363:60: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1553:57: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1627:55: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1681:64: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1739:69: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]
pg_createsubscriber.c:1830:64: warning: declaration of ‘dbinfo’
shadows a global declaration [-Wshadow]
pg_createsubscriber.c:121:31: warning: shadowed declaration is here [-Wshadow]

==
Kind Regards,
Peter Smith.
Fujitsu Australia
controldata_utils.c:52:29: warning: declaration of ‘DataDir’ shadows a global 
declaration [-Wshadow]
../../src/include/miscadmin.h:172:26: warning: shadowed declaration is here 
[-Wshadow]
controldata_utils.c:189:32: warning: declaration of ‘DataDir’ shadows a global 
declaration [-Wshadow]
../../src/include/miscadmin.h:172:26: warning: shadowed declaration is here 
[-Wshadow]
brin.c:685:16: warning: declaration of ‘tmp’ shadows a previous local [-Wshadow]
brin.c:579:11: warning: shadowed declaration is here [-Wshadow]
gistbuild.c:1159:23: warning: declaration of ‘splitinfo’ shadows a previous 
local [-Wshadow]
gistbuild.c:1059:11: warning: shadowed declaration is here [-Wshadow]
xlogdesc.c:40:26: warning: declaration of ‘wal_level’ shadows a global 
declaration [-Wshadow]
../../../../src/include/access/xlog.h:96:24: warning: shadowed declaration is 
here [-Wshadow]
xlogdesc.c:165:9: warning: declaration of ‘wal_level’ shadows a global 
declaration [-Wshadow]
../../../../src/include/access/xlog.h:96:24: warning: shadowed declaration is 
here [-Wshadow]
xlogreader.c:106:24: warning: declaration of ‘wal_segment_size’ shadows a 
global declaration [-Wshadow]
../../../../src/include/access/xlog.h:37:24: warning: shadowed declaration is 
here [-Wshadow]
xlogrecovery.c:1210:13: warning: declaration of ‘backupEndRequired’ shadows a 
global declaration [-Wshadow]
xlogrecovery.c:284:13: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:1920:33: warning: declaration of ‘xlogreader’ shadows a global 
declaration [-Wshadow]
xlogrecovery.c:189:25: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:3144:28: warning: declaration of ‘xlogprefetcher’ shadows a 
global declaration [-Wshadow]
xlogrecovery.c:192:24: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:3148:19: warning: declaration of ‘xlogreader’ shadows a global 
declaration [-Wshadow]
xlogrecovery.c:189:25: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:3311:31: warning: declaration of ‘xlogreader’ shadows a global 
declaration [-Wshadow]
xlogrecovery.c:189:25: warning: shadowed declaration is here [-Wshadow]
xlogrecovery.c:4062:38: warning: declaration of ‘xlogprefetcher’ shadows a 
global declaration [

Re: Pgoutput not capturing the generated columns

2024-09-11 Thread Peter Smith

Hi Shubham,

Here are my general comments about the v30-0002 TAP test patch.

==

1.
As mentioned in a previous post [1] there are still several references
to the old 'include_generated_columns' option remaining in this patch.
They need replacing.

~~~

2.
+# Furthermore, all combinations are tested for publish_generated_columns=false
+# (see subscription sub1 of database 'postgres'), and
+# publish_generated_columns=true (see subscription sub2 of database
+# 'test_igc_true').

Those 'see subscription' notes and 'test_igc_true' are from the old
implementation. Those need fixing. BTW, 'test_pgc_true' is a better
name for the database now that the option name is changed.

In the previous implementation, the TAP test environment was:
- a common publication pub, on the 'postgres' database
- a subscription sub1 with option include_generated_columns=false, on
the 'postgres' database
- a subscription sub2 with option include_generated_columns=true, on
the 'test_igc_true' database

Now it is like:
- a publication pub1, on the 'postgres' database, with option
publish_generated_columns=false
- a publication pub2, on the 'postgres' database, with option
publish_generated_columns=true
- a subscription sub1, on the 'postgres' database for publication pub1
- a subscription sub2, on the 'test_pgc_true' database for publication pub2

It would be good to document that above convention because knowing how
the naming/numbering works makes it a lot easier to read the
subsequent test cases. Of course, it is really important to
name/number everything consistently otherwise these tests become hard
to follow.  AFAICT it is mostly OK, but the generated -> generated
publication should be called 'regress_pub2_gen_to_gen'

~~~

3.
+# Create table.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a *
2) STORED);
+ INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3);
+));
+
+# Create publication with publish_generated_columns=false.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = false)"
+);
+
+# Create table and subscription with copy_data=true.
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE tab_gen_to_nogen (a int, b int);
+ CREATE SUBSCRIPTION regress_sub1_gen_to_nogen CONNECTION
'$publisher_connstr' PUBLICATION regress_pub1_gen_to_nogen WITH
(copy_data = true);
+));
+
+# Create publication with publish_generated_columns=true.
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = true)"
+);
+

The code can be restructured to be simpler. Both publications are
always created on the 'postgres' database at the publisher node, so
let's just create them at the same time as the creating the publisher
table. It also makes readability much better e.g.

# Create table, and publications
$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE tab_gen_to_nogen (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab_gen_to_nogen (a) VALUES (1), (2), (3);
CREATE PUBLICATION regress_pub1_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = false);
CREATE PUBLICATION regress_pub2_gen_to_nogen FOR TABLE
tab_gen_to_nogen WITH (publish_generated_columns = true);
));

IFAICT this same simplification can be repeated multiple times in this TAP file.

~~

Similarly, it would be neater to combine DROP PUBLICATION's together too.

~~~

4.
Hopefully, the generated column 'copy_data' can be implemented again
soon for subscriptions, and then the initial sync tests here can be
properly implemented instead of the placeholders currently in patch
0002.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuDJToG%3DV-ogTi9_6fnhhn2S0%2BsVRGPynhcf9mEh0Q%3DLA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Pgoutput not capturing the generated columns

2024-09-10 Thread Peter Smith

Here are a some more review comments for patch v30-0001.

==
src/sgml/ref/create_publication.sgml

1.
+ 
+  If the publisher-side column is also a generated column
then this option
+  has no effect; the publisher column will be filled as normal with the
+  publisher-side computed or default data.
+ 

It should say "subscriber-side"; not "publisher-side". The same was
already reported by Sawada-San [1].

~~~

2.
+ 
+ This parameter can only be set true if
copy_data is
+ set to false.
+ 

IMO this limitation should be addressed by patch 0001 like it was
already done in the previous patches (e.g. v22-0002). I think
Sawada-san suggested the same [1].

Anyway, 'copy_data' is not a PUBLICATION option, so the fact it is
mentioned like this without any reference to the SUBSCRIPTION seems
like a cut/paste error from the previous implementation.

==
src/backend/catalog/pg_publication.c

3. pub_collist_validate
- if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
- ereport(ERROR,
- errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("cannot use generated column \"%s\" in publication column list",
-colname));
-

Instead of just removing this ERROR entirely here, I thought it would
be more user-friendly to give a WARNING if the PUBLICATION's explicit
column list includes generated cols when the option
"publish_generated_columns" is false. This combination doesn't seem
like something a user would do intentionally, so just silently
ignoring it (like the current patch does) is likely going to give
someone unexpected results/grief.

==
src/backend/replication/logical/proto.c

4. logicalrep_write_tuple, and logicalrep_write_attrs:

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

Why aren't you also checking the new PUBLICATION option here and
skipping all gencols if the "publish_generated_columns" option is
false? Or is the BMS of pgoutput_column_list_init handling this case?
Maybe there should be an Assert for this?

==
src/backend/replication/pgoutput/pgoutput.c

5. send_relation_and_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

Same question as #4.

~~~

6. prepare_all_columns_bms and pgoutput_column_list_init

+ if (att->attgenerated && !pub->pubgencolumns)
+ cols = bms_del_member(cols, i + 1);

IIUC, the algorithm seems overly tricky filling the BMS with all
columns, before straight away conditionally removing the generated
columns. Can't it be refactored to assign all the correct columns
up-front, to avoid calling bms_del_member()?

==
src/bin/pg_dump/pg_dump.c

7. getPublications

IIUC, there is lots of missing SQL code here (for all older versions)
that should be saying "false AS pubgencolumns".
e.g. compare the SQL with how "false AS pubviaroot" is used.

==
src/bin/pg_dump/t/002_pg_dump.pl

8. Missing tests?

I expected to see a pg_dump test for this new PUBLICATION option.

==
src/test/regress/sql/publication.sql

9. Missing tests?

How about adding another test case that checks this new option must be
"Boolean"?

~~~

10. Missing tests?

--- error: generated column "d" can't be in list
+-- ok: generated columns can be in the list too
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
+ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;

(see my earlier comment #3)

IMO there should be another test case for a WARNING here if the user
attempts to include generated column 'd' in an explicit PUBLICATION
column list while the "publish_generated-columns" is false.

==
[1]  
https://www.postgresql.org/message-id/CAD21AoA-tdTz0G-vri8KM2TXeFU8RCDsOpBXUBCgwkfokF7%3DjA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Disallow altering invalidated replication slots

2024-09-10 Thread Peter Smith

On Wed, Sep 11, 2024 at 3:54 AM Bharath Rupireddy
 wrote:
>
> Hi,
>
> Thanks for reviewing.
>
> On Tue, Sep 10, 2024 at 8:40 AM Peter Smith  wrote:
> >
> > Commit message
> >
> > 1.
> > ALTER_REPLICATION_SLOT on invalidated replication slots is unnecessary
> > as there is no way...
> >
> > suggestion:
> > ALTER_REPLICATION_SLOT for invalid replication slots should not be
> > allowed because there is no way...
>
> Modified.
>
> > ==
> > 2. Missing docs update
> >
> > Should this docs page [1] be updated to say ALTER_REPLICATION_SLOT is
> > not allowed for invalid slots?
>
> Haven't noticed for other ERROR cases in the docs, e.g. slots being
> synced, temporary slots. Not sure if it's worth adding every ERROR
> case to the docs.
>

OK.

> > ==
> > src/backend/replication/slot.c
> >
> > 3.
> > + if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE)
> > + ereport(ERROR,
> > + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("cannot alter replication slot \"%s\"", name),
> > + errdetail("This replication slot was invalidated due to \"%s\".",
> > +   SlotInvalidationCauses[MyReplicationSlot->data.invalidated]));
> > +
> >
> > I thought including the reason "invalid" (e.g. "cannot alter invalid
> > replication slot \"%s\"") in the message might be better, but OTOH I
> > see the patch message is the same as an existing one. Maybe see what
> > others think.
>
> Changed.
>
> > ==
> > src/test/recovery/t/035_standby_logical_decoding.pl
> >
> > 3.
> > There is already a comment about this test:
> > ##
> > # Recovery conflict: Invalidate conflicting slots, including in-use slots
> > # Scenario 1: hot_standby_feedback off and vacuum FULL
> > #
> > # In passing, ensure that replication slot stats are not removed when the
> > # active slot is invalidated.
> > ##
> >
> > IMO we should update that "In passing..." sentence to something like:
> >
> > In passing, ensure that replication slot stats are not removed when
> > the active slot is invalidated, and check that an error occurs when
> > attempting to alter the invalid slot.
>
> Added. But, keeping it closer to the test case doesn't hurt.
>
> Please find the attached v2 patch also having Shveta's review comments
> addressed.
>

The v2 patch looks OK to me.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Pgoutput not capturing the generated columns

2024-09-10 Thread Peter Smith

IIUC, previously there was a subscriber side option
'include_generated_columns', but now since v30* there is a publisher
side option 'publish_generated_columns'.

Fair enough, but in the v30* patches I can still see remnants of the
old name 'include_generated_columns' all over the place:
- in the commit message
- in the code (struct field names, param names etc)
- in the comments
- in the docs

If the decision is to call the new PUBLICATION option
'publish_generated_columns', then can't we please use that one name
*everywhere* -- e.g. replace all cases where any old name is still
lurking?

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: GUC names in messages

2024-09-10 Thread Peter Smith

On Wed, Sep 4, 2024 at 3:54 PM Michael Paquier  wrote:
>
...
> 0001 and 0004 have been applied with these tweaks.  I am still not
> sure about the changes for DateStyle and IntervalStyle in 0002 and
> 0003.  Perhaps others have an opinion that could drive to a consensus.
>

Thanks for pushing the patches 0001 and 0004.

I have rebased the two remaining patches. See v12 attached.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v12-0002-GUC-names-fix-case-datestyle.patch
Description: Binary data


v12-0001-GUC-names-fix-case-intervalstyle.patch
Description: Binary data

Re: Disallow altering invalidated replication slots

2024-09-09 Thread Peter Smith

Hi, here are some review comments for patch v1.

==
Commit message

1.
ALTER_REPLICATION_SLOT on invalidated replication slots is unnecessary
as there is no way...

suggestion:
ALTER_REPLICATION_SLOT for invalid replication slots should not be
allowed because there is no way...

==
2. Missing docs update

Should this docs page [1] be updated to say ALTER_REPLICATION_SLOT is
not allowed for invalid slots?

==
src/backend/replication/slot.c

3.
+ if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter replication slot \"%s\"", name),
+ errdetail("This replication slot was invalidated due to \"%s\".",
+   SlotInvalidationCauses[MyReplicationSlot->data.invalidated]));
+

I thought including the reason "invalid" (e.g. "cannot alter invalid
replication slot \"%s\"") in the message might be better, but OTOH I
see the patch message is the same as an existing one. Maybe see what
others think.

==
src/test/recovery/t/035_standby_logical_decoding.pl

3.
There is already a comment about this test:
##
# Recovery conflict: Invalidate conflicting slots, including in-use slots
# Scenario 1: hot_standby_feedback off and vacuum FULL
#
# In passing, ensure that replication slot stats are not removed when the
# active slot is invalidated.
##

IMO we should update that "In passing..." sentence to something like:

In passing, ensure that replication slot stats are not removed when
the active slot is invalidated, and check that an error occurs when
attempting to alter the invalid slot.

==
[1] docs - https://www.postgresql.org/docs/devel/protocol-replication.html

Kind Regards,
Peter Smith.
Fujitsu Austalia

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-09 Thread Peter Smith

~~~

16.
+{
+ my ($node, $slot, $offset, $inactive_timeout) = @_;
+ my $name = $node->name;

The variable $name seems too vague. How about $node_name?

~~~

17.
+ # Wait for invalidation reason to be set
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = '$slot' AND
+ invalidation_reason = 'inactive_timeout';
+ ])
+   or die
+   "Timed out while waiting for invalidation reason of slot $slot to
be set on node $name";

17a.
nit - /# Wait for invalidation reason to be set/# Check that the
invalidation reason is 'inactive_timeout'/

IIUC, the 'trigger_slot_invalidation' function has already invalidated
the slot at this point, so we are not really "Waiting..."; we are
"Checking..." that the reason was correctly set.

~

17b.
I think this code fragment maybe would be better put inside the
'trigger_slot_invalidation' function. (I've done this in the nitpicks
attachment)

~~~

18.
+ # Check that invalidated slot cannot be acquired
+ my ($result, $stdout, $stderr);
+
+ ($result, $stdout, $stderr) = $node->psql(
+ 'postgres', qq[
+ SELECT pg_replication_slot_advance('$slot', '0/1');
+ ]);

18a.
s/Check that invalidated slot/Check that an invalidated slot/

~

18b.
nit - Remove some blank lines, because the comment applies to all below it.

==
sub trigger_slot_invalidation

19.
+# Trigger slot invalidation and confirm it in server log
+sub trigger_slot_invalidation

nit - s/confirm it in server log/confirm it in the server log/

~

20.
+{
+ my ($node, $slot, $offset, $inactive_timeout) = @_;
+ my $name = $node->name;
+ my $invalidated = 0;

(same as the other subroutine)
nit - The variable $name seems too vague. How about $node_name?

==

Please refer to the attached nitpicks top-up patch which implements
most of the above nits.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/recovery/t/050_invalidate_slots.pl 
b/src/test/recovery/t/050_invalidate_slots.pl
index 669d6cc..34b46d5 100644
--- a/src/test/recovery/t/050_invalidate_slots.pl
+++ b/src/test/recovery/t/050_invalidate_slots.pl
@@ -12,10 +12,10 @@ use Time::HiRes qw(usleep);
 # =
 # Testcase start
 #
-# Invalidate streaming standby slot and logical failover slot on primary due to
-# inactive timeout. Also, check logical failover slot synced to standby from
-# primary doesn't invalidate on its own, but gets the invalidated state from 
the
-# primary.
+# Invalidate a streaming standby slot and logical failover slot on the primary
+# due to inactive timeout. Also, check that a logical failover slot synced to
+# the standby from the primary doesn't invalidate on its own, but gets the
+# invalidated state from the primary.
 
 # Initialize primary
 my $primary = PostgreSQL::Test::Cluster->new('primary');
@@ -45,7 +45,7 @@ primary_slot_name = 'sb_slot1'
 primary_conninfo = '$connstr dbname=postgres'
 ));
 
-# Create sync slot on primary
+# Create sync slot on the primary
 $primary->psql('postgres',
q{SELECT pg_create_logical_replication_slot('sync_slot1', 
'test_decoding', false, false, true);}
 );
@@ -57,13 +57,13 @@ $primary->safe_psql(
 
 $standby1->start;
 
-# Wait until standby has replayed enough data
+# Wait until the standby has replayed enough data
 $primary->wait_for_catchup($standby1);
 
-# Sync primary slot to standby
+# Sync the primary slots to the standby
 $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
 
-# Confirm that logical failover slot is created on standby
+# Confirm that the logical failover slot is created on the standby
 is( $standby1->safe_psql(
'postgres',
q{SELECT count(*) = 1 FROM pg_replication_slots
@@ -73,24 +73,24 @@ is( $standby1->safe_psql(
'logical slot sync_slot1 has synced as true on standby');
 
 my $logstart = -s $primary->logfile;
-my $inactive_timeout = 1;
 
-# Set timeout so that next checkpoint will invalidate inactive slot
+# Set timeout GUC so that that next checkpoint will invalidate inactive slots
+my $inactive_timeout = 1;
 $primary->safe_psql(
'postgres', qq[
 ALTER SYSTEM SET replication_slot_inactive_timeout TO 
'${inactive_timeout}s';
 ]);
 $primary->reload;
 
-# Check for logical failover slot to become inactive on primary. Note that
+# Wait for logical failover slot to become inactive on the primary. Note that
 # nobody has acquired slot yet, so it must get invalidated due to
 # inactive timeout.
-check_for_slot_invalidation($primary, 'sync_slot1', $logstart,
+wait_for_slot_invalidation($primary, 'sync_slot1', $logstart,
$ina

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-09 Thread Peter Smith

Hi, here are some review comments for v45-0001 (excluding the test code)

==
doc/src/sgml/config.sgml

1.
+Note that the inactive timeout invalidation mechanism is not
+applicable for slots on the standby server that are being synced
+from primary server (i.e., standby slots having

nit - /from primary server/from the primary server/

==
src/backend/replication/slot.c

2. ReplicationSlotAcquire

+ errmsg("can no longer get changes from replication slot \"%s\"",
+ NameStr(s->data.name)),
+ errdetail("This slot has been invalidated because it was inactive
for longer than the amount of time specified by \"%s\".",
+"replication_slot_inactive_timeout.")));

nit - "replication_slot_inactive_timeout." - should be no period
inside that GUC name literal

~~~

3. ReportSlotInvalidation

I didn't understand why there was a hint for:
"You might need to increase \"%s\".", "max_slot_wal_keep_size"

But you don't have an equivalent hint for timeout invalidation:
"You might need to increase \"%s\".", "replication_slot_inactive_timeout"

Why aren't these similar cases consistent?

~~~

4. RestoreSlotFromDisk

+ /* Use the same inactive_since time for all the slots. */
+ if (now == 0)
+ now = GetCurrentTimestamp();
+

Is the deferred assignment really necessary? Why not just
unconditionally assign the 'now' just before the for-loop? Or even at
the declaration? e.g. The 'replication_slot_inactive_timeout' is
measured in seconds so I don't think 'inactive_since' being wrong by a
millisecond here will make any difference.

==
src/include/replication/slot.h

5. ReplicationSlotSetInactiveSince

+/*
+ * Set slot's inactive_since property unless it was previously invalidated due
+ * to inactive timeout.
+ */
+static inline void
+ReplicationSlotSetInactiveSince(ReplicationSlot *s, TimestampTz *now,
+ bool acquire_lock)
+{
+ if (acquire_lock)
+ SpinLockAcquire(&s->mutex);
+
+ if (s->data.invalidated != RS_INVAL_INACTIVE_TIMEOUT)
+ s->inactive_since = *now;
+
+ if (acquire_lock)
+ SpinLockRelease(&s->mutex);
+}

Is the logic correct? What if the slot was already invalid due to some
reason other than RS_INVAL_INACTIVE_TIMEOUT? Is an Assert needed?

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 27b2285..97b4fb5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4582,7 +4582,7 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows

 Note that the inactive timeout invalidation mechanism is not
 applicable for slots on the standby server that are being synced
-from primary server (i.e., standby slots having
+from the primary server (i.e., standby slots having
 pg_replication_slots.synced
 value true).
 Synced slots are always considered to be inactive because they don't
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index d92b92b..8cc67b4 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -640,7 +640,7 @@ retry:
 errmsg("can no longer get changes from 
replication slot \"%s\"",
NameStr(s->data.name)),
 errdetail("This slot has been invalidated 
because it was inactive for longer than the amount of time specified by 
\"%s\".",
-  
"replication_slot_inactive_timeout.")));
+  
"replication_slot_inactive_timeout")));
}
 
/*

Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-08 Thread Peter Smith

On Mon, Sep 9, 2024 at 12:20 PM David G. Johnston
 wrote:
>
>
>
> On Sun, Sep 8, 2024, 18:55 Peter Smith  wrote:
>>
>> Saying "The time..." is fine, but the suggestions given seem backwards to me:
>> - The time this slot was inactivated
>> - The time when the slot became inactive.
>> - The time when the slot was deactivated.
>>
>> e.g. It is not like light switch. So, there is no moment when the
>> active slot "became inactive" or "was deactivated".
>
>
> While this is plausible the existing wording and the name of the field 
> definitely fail to convey this.
>
>>
>> Rather, the 'inactive_since' timestamp field is simply:
>> - The time the slot was last active.
>> - The last time the slot was active.
>
>
> I see your point but that wording is also quite confusing when an active slot 
> returns null for this field.
>
> At this point I'm confused enough to need whatever wording is taken to be 
> supported by someone explaining the code that interacts with this field.
>

Me too. I created this thread primarily to get the description changed
to clarify this field represents a moment in time, rather than a
duration. So I will be happy with any wording that addresses that.

> I suppose I'm expecting something like: The time the last activity finished, 
> or null if an activity is in-progress.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-08 Thread Peter Smith

Saying "The time..." is fine, but the suggestions given seem backwards to me:
- The time this slot was inactivated
- The time when the slot became inactive.
- The time when the slot was deactivated.

e.g. It is not like light switch. So, there is no moment when the
active slot "became inactive" or "was deactivated".

Rather, the 'inactive_since' timestamp field is simply:
- The time the slot was last active.
- The last time the slot was active.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-03 Thread Peter Smith

On Tue, Sep 3, 2024 at 4:12 PM shveta malik  wrote:
>
...
> Shall we make the change in code-comment as well:
>
> typedef struct ReplicationSlot
> {
> ...
> /* The time since the slot has become inactive */
> TimestampTz inactive_since;
> }
>

Hi Shveta,

Yes, I think so. I hadn't bothered to include this in the v1 patch
only because the docs are user-facing, but this code comment isn't.
However, now that you've mentioned it, I made the same change here
also. Thanks. See patch v2.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

v2-0001-fix-description-for-inactive_since.patch
Description: Binary data

Re: Collect statistics about conflicts in logical replication

2024-09-03 Thread Peter Smith

On Tue, Sep 3, 2024 at 9:23 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Tuesday, September 3, 2024 7:12 PM Amit Kapila  
> wrote:
> >
> > Testing the stats for all types of conflicts is not required for this patch
> > especially because they increase the timings by 3-4s. We can add tests for 
> > one
> > or two types of conflicts.
> >
...
>
> Thanks for the comments. I have addressed the comments and adjusted the tests.
> In the V6 patch, Only insert_exists and delete_missing are tested.
>
> I confirmed that it only increased the testing time by 1 second on my machine.
>
> Best Regards,
> Hou zj

It seems a pity to throw away perfectly good test cases just because
they increase how long the suite takes to run.

This seems like yet another example of where we could have made good
use of the 'PG_TEST_EXTRA' environment variable. I have been trying to
propose adding "subscription" support for this in another thread [1].
By using this variable to make some tests conditional, we could have
the best of both worlds. e.g.
- retain all tests, but
- by default, only run a subset of those tests (to keep default test
execution time low).

I hope that if the idea to use PG_TEST_EXTRA for "subscription" tests
gets accepted then later we can revisit this, and put all the removed
extra test cases back in again.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsgtnr5BgcqYwD3PSf-AsUtVDE_j799AaZeAjJvE6HGtA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: GUC names in messages

2024-09-03 Thread Peter Smith

On Tue, Sep 3, 2024 at 4:35 PM Michael Paquier  wrote:
>
> On Tue, Sep 03, 2024 at 12:00:19PM +1000, Peter Smith wrote:
> > Here is the rebased patch set v10*. Everything is the same as before
> > except now there are only 7 patches instead of 8. The previous v9-0001
> > ("bool") patch no longer exists because those changes are now already
> > present in HEAD.
> >
> > I hope these might be pushed soon to avoid further rebasing.
>
> 0001~0004 could just be merged, they're the same thing, for different
> GUC types.  The consensus mentioned in 17974ec25946 makes that clear.
>
> 0007 is a good thing for translators, indeed..  I'll see about doing
> something here, at least.
> --
> Michael

Hi Michael, thanks for your interest.

I have merged the patches 0001-0004 as suggested. Please see v11 attachments.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v11-0001-Add-quotes-for-GUCs.patch
Description: Binary data


v11-0003-GUC-names-fix-case-datestyle.patch
Description: Binary data


v11-0004-GUC-names-make-common-translatable-messages.patch
Description: Binary data


v11-0002-GUC-names-fix-case-intervalstyle.patch
Description: Binary data

Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-03 Thread Peter Smith

On Tue, Sep 3, 2024 at 4:35 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Tue, Sep 03, 2024 at 10:43:14AM +0530, Amit Kapila wrote:
> > On Mon, Sep 2, 2024 at 9:14 AM shveta malik  wrote:
> > >
> > > On Mon, Sep 2, 2024 at 5:47 AM Peter Smith  wrote:
> > > > 
> > > >
> > > > To summarize, the current description wrongly describes the field as a
> > > > time duration:
> > > > "The time since the slot has become inactive."
> > > >
> > > > I suggest replacing it with:
> > > > "The slot has been inactive since this time."
> > > >
> > >
> > > +1 for the change. If I had read the document without knowing about
> > > the patch, I too would have interpreted it as a duration.
> > >
> >
> > The suggested change looks good to me as well. I'll wait for a day or
> > two before pushing to see if anyone thinks otherwise.
>
> I'm not 100% convinced the current wording is confusing because:
>
> - the field type is described as a "timestamptz".
> - there is no duration unit in the wording (if we were to describe a duration,
> we would probably add an unit to it, like "The time (in s)...").
>

Hmm. I assure you it is confusing because in English "The time since"
implies duration, and that makes the sentence contrary to the
timestamptz field type.  Indeed, I cited the Chat-GPT's interpretation
above specifically so that people would not just take this as my
opinion.

> That said, if we want to highlight that this is not a duration, what about?
>
> "The time (not duration) since the slot has become inactive."
>

There is no need to "highlight" anything. To avoid confusion we only
need to say what we mean.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-02 Thread Peter Smith

idation here, because it is the other subroutine doing the
waiting for the invalidation message in the logs. Instead, here I
think you are just confirming the 'invalidation_reason' got set
correctly. The comment should say what it is really doing.

==
sub check_for_slot_invalidation_in_server_log

10.
+# Check for invalidation of slot in server log
+sub check_for_slot_invalidation_in_server_log
+{

I think the main function of this subroutine is the CHECKPOINT and the
waiting for the server log to say invalidation happened. It is doing a
loop of a) CHECKPOINT then b) inspecting the server log for the slot
invalidation, and c) waiting for a bit. Repeat 10 times.

A comment describing the logic for this subroutine would be helpful.

The most important side-effect of this function is the CHECKPOINT
because without that nothing will ever get invalidated due to
inactivity, but this key point is not obvious from the subroutine
name.

IMO it would be better to name this differently to reflect what it is
really doing:
e.g. "CHECKPOINT_and_wait_for_slot_invalidation_in_server_log"

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: GUC names in messages

2024-09-02 Thread Peter Smith

Hi.

The cfbot was reporting my patches needed to be rebased.

Here is the rebased patch set v10*. Everything is the same as before
except now there are only 7 patches instead of 8. The previous v9-0001
("bool") patch no longer exists because those changes are now already
present in HEAD.

I hope these might be pushed soon to avoid further rebasing.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v10-0001-Add-quotes-for-GUCs-int.patch
Description: Binary data


v10-0004-Add-quotes-for-GUCs-enum.patch
Description: Binary data


v10-0003-Add-quotes-for-GUCs-string.patch
Description: Binary data


v10-0002-Add-quotes-for-GUCs-real.patch
Description: Binary data


v10-0005-GUC-names-fix-case-intervalstyle.patch
Description: Binary data


v10-0006-GUC-names-fix-case-datestyle.patch
Description: Binary data


v10-0007-GUC-names-make-common-translatable-messages.patch
Description: Binary data

Re: Collect statistics about conflicts in logical replication

2024-09-02 Thread Peter Smith

On Mon, Sep 2, 2024 at 1:28 PM shveta malik  wrote:
>
> On Mon, Sep 2, 2024 at 4:20 AM Peter Smith  wrote:
> >
> > On Fri, Aug 30, 2024 at 4:24 PM shveta malik  wrote:
> > >
> > > On Fri, Aug 30, 2024 at 10:53 AM Peter Smith  
> > > wrote:
> > > >
> > ...
> > > > 2. Arrange all the counts into an intuitive/natural order
> > > >
> > > > There is an intuitive/natural ordering for these counts. For example,
> > > > the 'confl_*' count fields are in the order insert -> update ->
> > > > delete, which LGTM.
> > > >
> > > > Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not
> > > > in a good order.
> > > >
> > > > IMO it makes more sense if everything is ordered as:
> > > > 'sync_error_count', then 'apply_error_count', then all the 'confl_*'
> > > > counts.
> > > >
> > > > This comment applies to lots of places, e.g.:
> > > > - docs (doc/src/sgml/monitoring.sgml)
> > > > - function pg_stat_get_subscription_stats (pg_proc.dat)
> > > > - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
> > > > - TAP test SELECTs (test/subscription/t/026_stats.pl)
> > > >
> > > > As all those places are already impacted by this patch, I think it
> > > > would be good if (in passing) we (if possible) also swapped the
> > > > sync/apply counts so everything  is ordered intuitively top-to-bottom
> > > > or left-to-right.
> > >
> > > Not sure about this though. It does not seem to belong to the current 
> > > patch.
> > >
> >
> > Fair enough. But, besides being inappropriate to include in the
> > current patch, do you think the suggestion to reorder them made sense?
> > If it has some merit, then I will propose it again as a separate
> > thread.
> >
>
>  Yes, I think it makes sense. With respect to internal code, it might
> be still okay as is, but when it comes to pg_stat_subscription_stats,
> I think it is better if user finds it in below order:
>  subid | subname | sync_error_count | apply_error_count | confl_*
>
>  rather than the existing one:
>  subid | subname | apply_error_count | sync_error_count | confl_*
>

Hi Shveta, Thanks. FYI - I created a new thread for this here [1].

==
[1] 
https://www.postgresql.org/message-id/CAHut+PvbOw90wgGF4aV1HyYtX=6pjWc+pn8_fep7L=alxwx...@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

pg_stats_subscription_stats order of the '*_count' columns

2024-09-02 Thread Peter Smith

Hi,

While reviewing another thread I was looking at the view
'pg_stats_subscription_stats' view. In particular, I was looking at
the order of the "*_count" columns of that view.

IMO there is an intuitive/natural ordering for the logical replication
operations (LR) being counted. For example, LR "initial tablesync"
always comes before LR "apply".

I propose that the columns of the view should also be in this same
intuitive order: Specifically, "sync_error_count" should come before
"apply_error_count" (left-to-right in view, top-to-bottom in docs).

Currently, they are not arranged that way.

The view today has only 2 count columns in HEAD, so this proposal
seems trivial, but there is another patch [2] soon to be pushed, which
will add more conflict count columns. As the number of columns
increases IMO it becomes more important that each column is where you
would intuitively expect to find it.

Changes would be needed in several places:
- docs (doc/src/sgml/monitoring.sgml)
- function pg_stat_get_subscription_stats (pg_proc.dat)
- view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
- TAP test SELECTs (test/subscription/t/026_stats.pl)

Thoughts?

==
[1] docs -
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION-STATS
[2] stats for conflicts -
https://www.postgresql.org/message-id/flat/OS0PR01MB57160A07BD575773045FC214948F2%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-09-02 Thread Peter Smith

test/recovery/t/050_invalidate_slots.pl

~~~

Please refer to the attached file which implements some of the nits
mentioned above.

==
[1] v43 review -
https://www.postgresql.org/message-id/CAHut%2BPuFzCHPCiZbpoQX59kgZbebuWT0gR0O7rOe4t_sdYu%3DOA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 970b496..0537714 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4564,8 +4564,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows
   
   

-Invalidates replication slots that are inactive for longer than
-specified amount of time. If this value is specified without units,
+Invalidate replication slots that are inactive for longer than this
+amount of time. If this value is specified without units,
 it is taken as seconds. A value of zero (which is default) disables
 the timeout mechanism. This parameter can only be set in
 the postgresql.conf file or on the server
@@ -4573,11 +4573,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows

 

-This invalidation check happens either when the slot is acquired
-for use or during checkpoint. The time since the slot has become
-inactive is known from its
-inactive_since value using which the
-timeout is measured.
+Slot invalidation due to inactivity timeout occurs during checkpoint.
+The duration of slot inactivity is calculated using the slot's
+inactive_since field value.

 

@@ -4585,9 +4583,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows
 applicable for slots on the standby server that are being synced
 from primary server (i.e., standby slots having
 synced field true).
-Because such synced slots are typically considered not active
-(for them to be later considered as inactive) as they don't perform
-logical decoding to produce the changes.
+Synced slots are always considered to be inactive because they don't
+perform logical decoding to produce changes.

   
  
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index acc0370..bb06592 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -551,12 +551,11 @@ ReplicationSlotName(int index, Name name)
  * An error is raised if nowait is true and the slot is currently in use. If
  * nowait is false, we sleep until the slot is released by the owning process.
  *
- * An error is raised if check_for_invalidation is true and the slot has been
+ * An error is raised if error_if_invalid is true and the slot has been
  * invalidated previously.
  */
 void
-ReplicationSlotAcquire(const char *name, bool nowait,
-  bool check_for_invalidation)
+ReplicationSlotAcquire(const char *name, bool nowait,  bool error_if_invalid)
 {
ReplicationSlot *s;
int active_pid;
@@ -635,11 +634,10 @@ retry:
MyReplicationSlot = s;
 
/*
-* Error out if the slot has been invalidated previously. Because 
there's
-* no use in acquiring the invalidated slot.
+* An error is raised if error_if_invalid is true and the slot has been
+* invalidated previously.
 */
-   if (check_for_invalidation &&
-   s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT)
+   if (error_if_invalid && s->data.invalidated == 
RS_INVAL_INACTIVE_TIMEOUT)
{
Assert(s->inactive_since > 0);
ereport(ERROR,
@@ -1565,6 +1563,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause 
cause,
 _("You might 
need to increase \"%s\"."), "max_slot_wal_keep_size");
break;
}
+
case RS_INVAL_HORIZON:
appendStringInfo(&err_detail, _("The slot conflicted 
with xid horizon %u."),
 
snapshotConflictHorizon);
@@ -1573,6 +1572,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause 
cause,
case RS_INVAL_WAL_LEVEL:
appendStringInfoString(&err_detail, _("Logical decoding 
on standby requires \"wal_level\" >= \"logical\" on the primary server."));
break;
+
case RS_INVAL_INACTIVE_TIMEOUT:
Assert(inactive_since > 0);
appendStringInfo(&err_detail,
@@ -1584,6 +15

DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-09-01 Thread Peter Smith

Hi hackers. While reviewing another thread I had cause to look at the
docs for the pg_replication_slot 'inactive_since' field [1] for the
first time.

I was confused by the description, which is as follows:

inactive_since timestamptz
The time since the slot has become inactive. NULL if the slot is
currently being used.

Then I found the github history for the patch [2], and the
accompanying long thread discussion [3] about the renaming of that
field. I have no intention to re-open that can-of-worms, but OTOH I
feel the first sentence of the field description is wrong and needs
fixing.

Specifically, IMO describing something as "The time since..." means
some amount of elapsed time since some occurrence, but that is not the
correct description for this timestamp field.

This is not just a case of me being pedantic. For example, here is
what Chat-GPT had to say:

I asked:
What does "The time since the slot has become inactive." mean?

ChatGPT said:
"The time since the slot has become inactive" refers to the duration
that has passed from the moment a specific slot (likely a database
replication slot or a similar entity) stopped being active. In other
words, it measures how much time has elapsed since the slot
transitioned from an active state to an inactive state.

For example, if a slot became inactive 2 hours ago, "the time since
the slot has become inactive" would be 2 hours.

To summarize, the current description wrongly describes the field as a
time duration:
"The time since the slot has become inactive."

I suggest replacing it with:
"The slot has been inactive since this time."

The attached patch makes this suggested change.

==
[1] docs - https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
[2] thread -
https://www.postgresql.org/message-id/ca+tgmob_ta-t2ty8qrkhbgnnlrf4zycwhghgfsuuofraedw...@mail.gmail.com
[3] push -
https://github.com/postgres/postgres/commit/6d49c8d4b4f4a20eb5b4c501d78cf894fa13c0ea

Kind Regards,
Peter Smith.
Fujitsu Australia

v1-0001-fix-description-for-inactive_since.patch
Description: Binary data

Re: Collect statistics about conflicts in logical replication

2024-09-01 Thread Peter Smith

On Fri, Aug 30, 2024 at 4:24 PM shveta malik  wrote:
>
> On Fri, Aug 30, 2024 at 10:53 AM Peter Smith  wrote:
> >
...
> > 2. Arrange all the counts into an intuitive/natural order
> >
> > There is an intuitive/natural ordering for these counts. For example,
> > the 'confl_*' count fields are in the order insert -> update ->
> > delete, which LGTM.
> >
> > Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not
> > in a good order.
> >
> > IMO it makes more sense if everything is ordered as:
> > 'sync_error_count', then 'apply_error_count', then all the 'confl_*'
> > counts.
> >
> > This comment applies to lots of places, e.g.:
> > - docs (doc/src/sgml/monitoring.sgml)
> > - function pg_stat_get_subscription_stats (pg_proc.dat)
> > - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
> > - TAP test SELECTs (test/subscription/t/026_stats.pl)
> >
> > As all those places are already impacted by this patch, I think it
> > would be good if (in passing) we (if possible) also swapped the
> > sync/apply counts so everything  is ordered intuitively top-to-bottom
> > or left-to-right.
>
> Not sure about this though. It does not seem to belong to the current patch.
>

Fair enough. But, besides being inappropriate to include in the
current patch, do you think the suggestion to reorder them made sense?
If it has some merit, then I will propose it again as a separate
thread.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Collect statistics about conflicts in logical replication

2024-09-01 Thread Peter Smith

On Fri, Aug 30, 2024 at 6:36 PM shveta malik  wrote:
>
> On Fri, Aug 30, 2024 at 12:15 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> >
> > Here is V5 patch which addressed above and Shveta's[1] comments.
> >
>
> The patch looks good to me.
>

Patch v5 LGTM.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Collect statistics about conflicts in logical replication

2024-08-29 Thread Peter Smith

Hi Hou-San. Here are my review comments for v4-0001.

1. Add links in the docs

IMO it would be good for all these confl_* descriptions (in
doc/src/sgml/monitoring.sgml) to include links back to where each of
those conflict types was defined [1].

Indeed, when links are included to the original conflict type
information then I think you should remove mentioning
"track_commit_timestamp":
+ counted only when the
+ track_commit_timestamp
+ option is enabled on the subscriber.

It should be obvious that you cannot count a conflict if the conflict
does not happen, but I don't think we should scatter/duplicate those
rules in different places saying when certain conflicts can/can't
happen -- we should just link everywhere back to the original
description for those rules.

~~~

2. Arrange all the counts into an intuitive/natural order

There is an intuitive/natural ordering for these counts. For example,
the 'confl_*' count fields are in the order insert -> update ->
delete, which LGTM.

Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not
in a good order.

IMO it makes more sense if everything is ordered as:
'sync_error_count', then 'apply_error_count', then all the 'confl_*'
counts.

This comment applies to lots of places, e.g.:
- docs (doc/src/sgml/monitoring.sgml)
- function pg_stat_get_subscription_stats (pg_proc.dat)
- view pg_stat_subscription_stats (src/backend/catalog/system_views.sql)
- TAP test SELECTs (test/subscription/t/026_stats.pl)

As all those places are already impacted by this patch, I think it
would be good if (in passing) we (if possible) also swapped the
sync/apply counts so everything is ordered intuitively top-to-bottom
or left-to-right.

==
[1]
https://www.postgresql.org/docs/devel/logical-replication-conflicts.html#LOGICAL-REPLICATION-CONFLICTS

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-08-29 Thread Peter Smith

sign
now = GetCurrentTimestamp(); here?

~

11.
+ * Note that we don't invalidate synced slots because,
+ * they are typically considered not active as they don't
+ * perform logical decoding to produce the changes.

nit - tweaked punctuation

~

12.
+ * If the slot can be acquired, do so or if the slot is already ours,
+ * then mark it invalidated.  Otherwise we'll signal the owning
+ * process, below, and retry.

nit - tidied this comment. Suggestion:
If the slot can be acquired, do so and mark it as invalidated. If the
slot is already ours, mark it as invalidated. Otherwise, we'll signal
the owning process below and retry.

~

13.
+ if (active_pid == 0 ||
+ (MyReplicationSlot != NULL &&
+ MyReplicationSlot == s &&
+ active_pid == MyProcPid))

You are already checking MyReplicationSlot == s here, so that extra
check for MyReplicationSlot != NULL is redundant, isn't it?

~~~

14. CheckPointReplicationSlots

 /*
- * Flush all replication slots to disk.
+ * Flush all replication slots to disk. Also, invalidate slots during
+ * non-shutdown checkpoint.
  *
  * It is convenient to flush dirty replication slots at the time of checkpoint.
  * Additionally, in case of a shutdown checkpoint, we also identify the slots

nit - /Also, invalidate slots/Also, invalidate obsolete slots/

==
src/backend/utils/misc/guc_tables.c

15.
+ {"replication_slot_inactive_timeout", PGC_SIGHUP, REPLICATION_SENDING,
+ gettext_noop("Sets the amount of time to wait before invalidating an "
+ "inactive replication slot."),

nit - that is maybe a bit misleading because IIUC there is no real
"waiting" happening anywhere. Suggest:
Sets the amount of time a replication slot can remain inactive before
it will be invalidated.

==

Please take a look at the attached top-up patches. These include
changes for many of the nits above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 7009350..c96ae53 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -671,9 +671,10 @@ retry:

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("can no longer get changes from 
replication slot \"%s\"",
NameStr(s->data.name)),
-errdetail("This slot has been 
invalidated because it was inactive since %s for more than %d seconds specified 
by \"replication_slot_inactive_timeout\".",
+errdetail("The slot became invalid 
because it was inactive since %s, which is more than %d seconds ago.",
   
timestamptz_to_str(s->inactive_since),
-  
replication_slot_inactive_timeout)));
+  
replication_slot_inactive_timeout),
+errhint("You might need to increase 
\"%s\".", "replication_slot_inactive_timeout")));
}
}
 
@@ -1738,9 +1739,9 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 * is disabled or slot is currently 
being used or the slot
 * on standby is currently being synced 
from the primary.
 *
-* Note that we don't invalidate synced 
slots because,
-* they are typically considered not 
active as they don't
-* perform logical decoding to produce 
the changes.
+* Note that we don't invalidate synced 
slots because
+* they are typically considered not 
active, as they don't
+* perform logical decoding to produce 
changes.
 */
if (replication_slot_inactive_timeout 
== 0 ||
s->inactive_since == 0 ||
@@ -1789,9 +1790,9 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
active_pid = s->active_pid;
 
/*
-* If the slot can be acquired, do so or if the slot is already 
ours,
-* then mark it invalidated.  Otherwise we'll signal the owning
-* process, below, and retry.
+* If the slot can be acquired, do so and mark it as 
invalidated.
+* If the slot is already ours, mark it as inv

Re: Collect statistics about conflicts in logical replication

2024-08-28 Thread Peter Smith

Hi Hou-San.

I tried an experiment where I deliberately violated a primary key
during initial table synchronization.

For example:

test_sub=# create table t1(a int primary key);
CREATE TABLE

test_sub=# insert into t1 values(1);
INSERT 0 1

test_sub=# create subscription sub1 connection 'dbname=test_pub'
publication pub1 with (enabled=false);
2024-08-29 09:53:21.172 AEST [24186] WARNING:  subscriptions created
by regression test cases should have names starting with "regress_"
WARNING:  subscriptions created by regression test cases should have
names starting with "regress_"
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION

test_sub=# select * from pg_stat_subscription_stats;
 subid | subname | apply_error_count | sync_error_count |
insert_exists_count | update_differ_count | update_exists_count |
update_missing_count | de
lete_differ_count | delete_missing_count | stats_reset
---+-+---+--+-+-+-+--+---
--+--+-
 16390 | sub1| 0 |0 |
 0 |   0 |   0 |
 0 |
0 |0 |
(1 row)

test_sub=# alter subscription sub1 enable;
ALTER SUBSCRIPTION

test_sub=# 2024-08-29 09:53:57.245 AEST [4345] LOG:  logical
replication apply worker for subscription "sub1" has started
2024-08-29 09:53:57.258 AEST [4347] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
2024-08-29 09:53:57.311 AEST [4347] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2024-08-29 09:53:57.311 AEST [4347] DETAIL:  Key (a)=(1) already exists.
2024-08-29 09:53:57.311 AEST [4347] CONTEXT:  COPY t1, line 1
2024-08-29 09:53:57.312 AEST [23501] LOG:  background worker "logical
replication tablesync worker" (PID 4347) exited with exit code 1
2024-08-29 09:54:02.385 AEST [4501] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
2024-08-29 09:54:02.462 AEST [4501] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2024-08-29 09:54:02.462 AEST [4501] DETAIL:  Key (a)=(1) already exists.
2024-08-29 09:54:02.462 AEST [4501] CONTEXT:  COPY t1, line 1
2024-08-29 09:54:02.463 AEST [23501] LOG:  background worker "logical
replication tablesync worker" (PID 4501) exited with exit code 1
2024-08-29 09:54:07.512 AEST [4654] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
2024-08-29 09:54:07.580 AEST [4654] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2024-08-29 09:54:07.580 AEST [4654] DETAIL:  Key (a)=(1) already exists.
2024-08-29 09:54:07.580 AEST [4654] CONTEXT:  COPY t1, line 1
...

test_sub=# alter subscription sub1 disable;'
ALTER SUBSCRIPTION
2024-08-29 09:55:10.329 AEST [4345] LOG:  logical replication worker
for subscription "sub1" will stop because the subscription was
disabled

test_sub=# select * from pg_stat_subscription_stats;
 subid | subname | apply_error_count | sync_error_count |
insert_exists_count | update_differ_count | update_exists_count |
update_missing_count | de
lete_differ_count | delete_missing_count | stats_reset
---+-+---+--+-+-+-+--+---
--+--+-
 16390 | sub1| 0 |   15 |
 0 |   0 |   0 |
 0 |
0 |0 |
(1 row)

~~~

Notice how after a while there were multiple (15) 'sync_error_count' recorded.

According to the docs: 'insert_exists' happens when "Inserting a row
that violates a NOT DEFERRABLE unique constraint.".  So why are there
not the same number of 'insert_exists_count' recorded in
pg_stat_subscription_stats?

The 'insert_exists' is either not happening or is not being counted
during table synchronization. Either way, it was not what I was
expecting. If it is not a bug, maybe the docs need to explain more
about the rules for 'insert_exists' during the initial table sync.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Collect statistics about conflicts in logical replication

2024-08-28 Thread Peter Smith

On Wed, Aug 28, 2024 at 9:19 PM Amit Kapila  wrote:
>
> On Wed, Aug 28, 2024 at 11:43 AM shveta malik  wrote:
> >
> > Thanks for the patch. Just thinking out loud, since we have names like
> > 'apply_error_count', 'sync_error_count' which tells that they are
> > actually error-count, will it be better to have something similar in
> > conflict-count cases, like 'insert_exists_conflict_count',
> > 'delete_missing_conflict_count' and so on. Thoughts?
> >
>
> It would be better to have conflict in the names but OTOH it will make
> the names a bit longer. The other alternatives could be (a)
> insert_exists_confl_count, etc. (b) confl_insert_exists_count, etc.
> (c) confl_insert_exists, etc. These are based on the column names in
> the existing view pg_stat_database_conflicts [1]. The (c) looks better
> than other options but it will make the conflict-related columns
> different from error-related columns.
>
> Yet another option is to have a different view like
> pg_stat_subscription_conflicts but that sounds like going too far.
>
> [1] - 
> https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-CONFLICTS-VIEW

Option (c) looked good to me.

Removing the suffix "_count" is OK. For example, try searching all of
Chapter 27 ("The Cumulative Statistics System") [1] for columns
described as "Number of ..." and you will find that a "_count" suffix
is used only rarely.

Adding the prefix "confl_" is OK. As mentioned, there is a precedent
for this. See "pg_stat_database_conflicts" [2].

Mixing column names where some have and some do not have "_count"
suffixes may not be ideal, but I see no problem because there are
precedents for that too. E.g. see "pg_stat_replication_slots" [3], and
"pg_stat_all_tables" [4].

==
[1] https://www.postgresql.org/docs/devel/monitoring-stats.html
[2] 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-CONFLICTS-VIEW
[3] 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW
[4] 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Conflict detection and logging in logical replication

2024-08-27 Thread Peter Smith

On Wed, Aug 28, 2024 at 3:53 PM shveta malik  wrote:
>
> On Wed, Aug 28, 2024 at 9:44 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > > > +1 on 'update_origin_differs' instead of 'update_origins_differ' as
> > > > the former is somewhat similar to other conflict names 'insert_exists'
> > > > and 'update_exists'.
> > >
> > > Since we reached a consensus on this, I am attaching a small patch to 
> > > rename
> > > as suggested.
> >
> > Sorry, I attached the wrong patch. Here is correct one.
> >
>
> LGTM.
>

LGTM.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Collect statistics about conflicts in logical replication

2024-08-26 Thread Peter Smith

On Mon, Aug 26, 2024 at 10:13 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Monday, August 26, 2024 3:30 PM Peter Smith  wrote:
> >
> > ==
> > src/include/replication/conflict.h
> >
> > nit - defined 'NUM_CONFLICT_TYPES' inside the enum (I think this way is
> > often used in other PG source enums)
>
> I think we have recently tended to avoid doing that, as it has been commented
> that this style is somewhat deceptive and can cause confusion. See a previous
> similar comment[1]. The current style follows the other existing examples 
> like:
>
> #define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1)
> #define IOCONTEXT_NUM_TYPES (IOCONTEXT_VACUUM + 1)
> #define IOOP_NUM_TYPES (IOOP_WRITEBACK + 1)
> #define BACKEND_NUM_TYPES (B_LOGGER + 1)

OK.

>
>
> > ==
> > src/test/subscription/t/026_stats.pl
> >
> > 1.
> > + # Delete data from the test table on the publisher. This delete
> > + operation # should be skipped on the subscriber since the table is already
> > empty.
> > + $node_publisher->safe_psql($db, qq(DELETE FROM $table_name;));
> > +
> > + # Wait for the subscriber to report tuple missing conflict.
> > + $node_subscriber->poll_query_until(
> > + $db,
> > + qq[
> > + SELECT update_missing_count > 0 AND delete_missing_count > 0 FROM
> > + pg_stat_subscription_stats WHERE subname = '$sub_name'
> > + ])
> > +   or die
> > +   qq(Timed out while waiting for tuple missing conflict for
> > subscription '$sub_name');
> >
> > Can you write a comment to explain why the replicated DELETE is
> > expected to increment both the 'update_missing_count' and the
> > 'delete_missing_count'?
>
> I think the comments several lines above the wait explained the reason[2]. I
> slightly modified the comments to make it clear.
>

1.
Right, but it still was not obvious to me what caused the
'update_missing_count'. On further study, I see it was a hangover from
the earlier UPDATE test case which was still stuck in an ERROR loop
attempting to do the update operation. e.g. before it was giving the
expected 'update_exists' conflicts but after the subscriber table
TRUNCATE the update conflict changes to give a 'update_missing'
conflict instead. I've updated the comment to reflect my
understanding. Please have a look to see if you agree.

2.
I separated the tests for 'update_missing' and 'delete_missing',
putting the update_missing test *before* the DELETE. I felt the
expected results were much clearer when each test did just one thing.
Please have a look to see if you agree.

~~~

3.
+# Enable track_commit_timestamp to detect origin-differ conflicts in logical
+# replication. Reduce wal_retrieve_retry_interval to 1ms to accelerate the
+# restart of the logical replication worker after encountering a conflict.
+$node_subscriber->append_conf(
+ 'postgresql.conf', q{
+track_commit_timestamp = on
+wal_retrieve_retry_interval = 1ms
+});

Later, after CDR resolvers are implemented, it might be good to
revisit these conflict test cases and re-write them to use some
conflict resolvers like 'skip'. Then the subscriber won't give ERRORs
and restart apply workers all the time behind the scenes, so you won't
need the above configuration for accelerating the worker restarts. In
other words, running these tests might be more efficient if you can
avoid restarting workers all the time.

I suggest putting an XXX comment here as a reminder that these tests
should be revisited to make use of conflict resolvers in the future.

~~~

nit - not caused by this patch, but other comment inconsistencies
about "stats_reset timestamp" can be fixed in passing too.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/test/subscription/t/026_stats.pl 
b/src/test/subscription/t/026_stats.pl
index 0df31a6..d9589f0 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -134,24 +134,37 @@ sub create_sub_pub_w_errors
  or die
  qq(Timed out while waiting for update_exists conflict for 
subscription '$sub_name');

-   # Truncate the test table to ensure that the conflicting update 
operations
-   # are skipped, allowing the test to continue.
-   $node_subscriber->safe_psql($db, qq(TRUNCATE $table_name));
+# Truncate the subscriber side test table. Now that the table is empty,
+# the update conflict ('update_existing') ERRORs will stop happening. A
+# single update conflict 'update_missing' will be reported, but the update
+# will be skipped on the subscriber, allowing the test to continue.
+$node_subscriber->safe_psql($db, qq(TRUNCATE $table

Re: Conflict detection and logging in logical replication

2024-08-26 Thread Peter Smith

On Mon, Aug 26, 2024 at 7:52 PM Amit Kapila  wrote:
>
> On Thu, Aug 22, 2024 at 2:21 PM Amit Kapila  wrote:
> >
> > On Thu, Aug 22, 2024 at 1:33 PM Peter Smith  wrote:
> > >
> > > Do you think the documentation for the 'column_value' parameter of the
> > > conflict logging should say that the displayed value might be
> > > truncated?
> > >
> >
> > I updated the patch to mention this and pushed it.
> >
>
> Peter Smith mentioned to me off-list that the names of conflict types
> 'update_differ' and 'delete_differ' are not intuitive as compared to
> all other conflict types like insert_exists, update_missing, etc. The
> other alternative that comes to mind for those conflicts is to name
> them as 'update_origin_differ'/''delete_origin_differ'.
>

For things to "differ" there must be more than one them. The plural of
origin is origins.

e.g. 'update_origins_differ'/''delete_origins_differ'.

OTOH, you could say "differs" instead of differ:

e.g. 'update_origin_differs'/''delete_origin_differs'.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Collect statistics about conflicts in logical replication

2024-08-26 Thread Peter Smith

Hi Hou-san. Here are some review comments for your patch v1-0001.

==
doc/src/sgml/logical-replication.sgml

nit - added a comma.

==
doc/src/sgml/monitoring.sgml

nit - use  for 'apply_error_count'.
nit - added a period when there are multiple sentences.
nit - adjusted field descriptions using Chat-GPT clarification suggestions

==
src/include/pgstat.h

nit - change the param to 'type' -- ie. same as the implementation calls it

==
src/include/replication/conflict.h

nit - defined 'NUM_CONFLICT_TYPES' inside the enum (I think this way
is often used in other PG source enums)

==
src/test/subscription/t/026_stats.pl

1.
+ # Delete data from the test table on the publisher. This delete operation
+ # should be skipped on the subscriber since the table is already empty.
+ $node_publisher->safe_psql($db, qq(DELETE FROM $table_name;));
+
+ # Wait for the subscriber to report tuple missing conflict.
+ $node_subscriber->poll_query_until(
+ $db,
+ qq[
+ SELECT update_missing_count > 0 AND delete_missing_count > 0
+ FROM pg_stat_subscription_stats
+ WHERE subname = '$sub_name'
+ ])
+   or die
+   qq(Timed out while waiting for tuple missing conflict for
subscription '$sub_name');

Can you write a comment to explain why the replicated DELETE is
expected to increment both the 'update_missing_count' and the
'delete_missing_count'?

~
nit - update several "Apply and Sync errors..." comments that did not
mention conflicts
nit - tweak comments wording for update_differ and delete_differ
nit - /both > 0/> 0/
nit - /both 0/0/

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/logical-replication.sgml 
b/doc/src/sgml/logical-replication.sgml
index f3e3641..f682369 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1585,7 +1585,7 @@ test_sub=# SELECT * FROM t1 ORDER BY id;
   
 
   
-   Additional logging is triggered and the conflict statistics are collected 
(displayed in the
+   Additional logging is triggered, and the conflict statistics are collected 
(displayed in the
pg_stat_subscription_stats
 view)
in the following conflict cases:

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ea36d46..ac3c773 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2159,7 +2159,7 @@ description | Waiting for a newly initialized WAL file to 
reach durable storage
   
Number of times an error occurred while applying changes. Note that any
conflict resulting in an apply error will be counted in both
-   apply_error_count and the corresponding conflict count.
+   apply_error_count and the corresponding conflict 
count.
   
  
 
@@ -2179,8 +2179,8 @@ description | Waiting for a newly initialized WAL file to 
reach durable storage
   
   
Number of times a row insertion violated a
-   NOT DEFERRABLE unique constraint while applying
-   changes
+   NOT DEFERRABLE unique constraint during the
+   application of changes
   
  
 
@@ -2189,11 +2189,11 @@ description | Waiting for a newly initialized WAL file 
to reach durable storage
update_differ_count bigint
   
   
-   Number of times an update was performed on a row that was previously
-   modified by another source while applying changes. This conflict is
+   Number of times an update was applied to a row that had been previously
+   modified by another source during the application of changes. This 
conflict is
counted only when the
track_commit_timestamp
-   option is enabled on the subscriber
+   option is enabled on the subscriber.
   
  
 
@@ -2202,9 +2202,9 @@ description | Waiting for a newly initialized WAL file to 
reach durable storage
update_exists_count bigint
   
   
-   Number of times that the updated value of a row violated a
-   NOT DEFERRABLE unique constraint while applying
-   changes
+   Number of times that an updated row value violated a
+   NOT DEFERRABLE unique constraint during the
+   application of changes
   
  
 
@@ -2213,8 +2213,8 @@ description | Waiting for a newly initialized WAL file to 
reach durable storage
update_missing_count bigint
   
   
-   Number of times that the tuple to be updated was not found while 
applying
-   changes
+   Number of times the tuple to be updated was not found during the
+   application of changes
   
  
 
@@ -2223,11 +2223,11 @@ description | Waiting for a newly initialized WAL file 
to reach durable storage
delete_differ_count bigint
   
   
-   Number of times a delete was performed on a row that was previously
-   modified by another source while applying changes. This conflict is
-   counted only when

Re: Conflict Detection and Resolution

2024-08-25 Thread Peter Smith

On Thu, Aug 22, 2024 at 8:15 PM shveta malik  wrote:
>
> On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond  wrote:
> >
> > The patches have been rebased on the latest pgHead following the merge
> > of the conflict detection patch [1].
>
> Thanks for working on patches.
>
> Summarizing the issues which need some suggestions/thoughts.
>
> 1)
> For subscription based resolvers, currently the syntax implemented is:
>
> 1a)
> CREATE SUBSCRIPTION 
> CONNECTION  PUBLICATION 
> CONFLICT RESOLVER
> (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> 1b)
> ALTER SUBSCRIPTION  CONFLICT RESOLVER
> (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> Earlier the syntax suggested in [1] was:
> CREATE SUBSCRIPTION  CONNECTION  PUBLICATION 
> CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1',
> CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2';
>
> I think the currently implemented syntax  is good as it has less
> repetition, unless others think otherwise.
>
> ~~
>
> 2)
> For subscription based resolvers, do we need a RESET command to reset
> resolvers to default? Any one of below or both?
>
> 2a) reset all at once:
>  ALTER SUBSCRIPTION  RESET CONFLICT RESOLVERS
>
> 2b) reset one at a time:
>  ALTER SUBSCRIPTION  RESET CONFLICT RESOLVER for 'conflict_type';
>
> The issue I see here is, to implement 1a and 1b, we have introduced
> the  'RESOLVER' keyword. If we want to implement 2a, we will have to
> introduce the 'RESOLVERS' keyword as well. But we can come up with
> some alternative syntax if we plan to implement these. Thoughts?
>

Hi Shveta,

I felt it would be better to keep the syntax similar to the existing
INSERT ... ON CONFLICT [1].

I'd suggest a syntax like this:

... ON CONFLICT ['conflict_type'] DO { 'conflict_action' | DEFAULT }

~~~

e.g.

To configure conflict resolvers for the SUBSCRIPTION:

CREATE SUBSCRIPTION subname CONNECTION coninfo PUBLICATION pubname
ON CONFLICT 'conflict_type1' DO 'conflict_action1',
ON CONFLICT 'conflict_type2' DO 'conflict_action2';

Likewise, for ALTER:

ALTER SUBSCRIPTION 
ON CONFLICT 'conflict_type1' DO 'conflict_action1',
ON CONFLICT 'conflict_type2' DO 'conflict_action2';

To RESET all at once:

ALTER SUBSCRIPTION 
ON CONFLICT DO DEFAULT;

And, to RESET one at a time:

ALTER SUBSCRIPTION 
ON CONFLICT 'conflict_type1' DO DEFAULT;

~~~

Although your list format "('conflict_type1' = 'conflict_action1',
'conflict_type2' = 'conflict_action2')"  is clear and without
repetition, I predict this terse style could end up being troublesome
because it does not offer much flexibility for whatever the future
might hold for CDR.

e.g. ability to handle the conflict with a user-defined resolver
e.g. ability to handle the conflict conditionally (e.g. with a WHERE clause...)
e.g. ability to handle all conflicts with a common resolver
etc.



Advantages of my suggestion:
- Close to existing SQL syntax
- No loss of clarity by removing the word "RESOLVER"
- No requirement for new keyword/s
- The commands now read more like English
- Offers more flexibility for any unknown future requirements
- The setup (via create subscription) and the alter/reset all look the same.

==
[1] https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Pgoutput not capturing the generated columns

2024-08-23 Thread Peter Smith

Hi Shubham,

I have reviewed v28* and posted updated v29 versions of patches 0001 and 0002.

If you are OK with these changes, the next task would be to pg_indent
them, then rebase the remaining patches (0003 etc.) and include those
with the next patchset version.

//

Patch v29-0001 changes:

nit - Fixed typo in comments.
nit - Removed an unnecessary format change for the unchanged
send_relation_and_attrs declaration.

//

Patch v29-0002 changes:

1.
Made fixes to address Vignesh's review comments [1].

2.
Added the missing test cases for tab_gen_to_gen, and tab_alter.

3.
Multiple other modifications include:
nit - Renamed the test database /test/test_igc_true/ because 'test'
was too vague.
nit - This patch does not need to change most of the existing 'tab1'
test. So we should not be reformating the existing test code for no
reason.
nit - I added a summary comment to describe the test combinations
nit - The "Testcase end" comments were unnecessary and prone to error,
so I removed them.
nit - Change comments /incremental sync/incremental replication/
nit - Added XXX notes about copy_data=false. These are reminders for
to change code in later TAP patches
nit - Rearranged test steps so the publisher does not do incremental
INSERT until all initial sync tests are done
nit - Added initial sync tests even if copy_data=false. This is for
completeness - these will be handled in a later TAP patch
nit - The table names are self-explanatory, so some of the test
"messages" were simplified

==
[1] 
https://www.postgresql.org/message-id/CALDaNm31LZQfeR8Vv1qNCOREGffvZbgGDrTp%3D3h%3DEHiHTEO2pQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v29-0001-Enable-support-for-include_generated_columns-opt.patch
Description: Binary data


v29-0002-Tap-tests-for-generated-columns.patch
Description: Binary data

Re: Conflict detection and logging in logical replication

2024-08-22 Thread Peter Smith

Hi Hou-san.

I was experimenting with some conflict logging and found that large
column values are truncated in the log DETAIL.

E.g. Below I have a table where I inserted a 3000 character text value
'bigbigbig..."

Then I caused a replication conflict.

test_sub=# delete fr2024-08-22 17:50:17.181 AEST [14901] LOG:  logical
replication apply worker for subscription "sub1" has started
2024-08-22 17:50:17.193 AEST [14901] ERROR:  conflict detected on
relation "public.t1": conflict=insert_exists
2024-08-22 17:50:17.193 AEST [14901] DETAIL:  Key already exists in
unique index "t1_pkey", modified in transaction 780.
Key (a)=(k3); existing local tuple (k3,
bigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigbigb...);
remote tuple (k3, this will clash).

~

Do you think the documentation for the 'column_value' parameter of the
conflict logging should say that the displayed value might be
truncated?

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: CREATE SUBSCRIPTION - add missing test case

2024-08-21 Thread Peter Smith

On Thu, Aug 22, 2024 at 8:54 AM Peter Smith  wrote:
>
> On Wed, Aug 21, 2024 at 8:48 PM Amit Kapila  wrote:
> >
> > On Fri, Aug 16, 2024 at 9:45 AM vignesh C  wrote:
> > >
> > > On Thu, 15 Aug 2024 at 12:55, Peter Smith  wrote:
> > > >
> > > > Hi Hackers,
> > > >
> > > > While reviewing another logical replication thread [1], I found an
> > > > ERROR scenario that seems to be untested.
> > > >
> > > > TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is
> > > > missing some expected column(s).
> > > >
> > > > Attached is a patch to add the missing test for this error message.
> > >
> > > I agree currently there is no test to hit this code.
> > >
> >
> > I also don't see a test for this error condition. However, it is not
> > clear to me how important is it to cover this error code path. This
> > code has existed for a long time and I didn't notice any bugs related
> > to this. There is a possibility that in the future we might break
> > something because of a lack of this test but not sure if we want to
> > cover every code path via tests as each additional test also has some
> > cost. OTOH, If others think it is important or a good idea to have
> > this test then I don't have any objection to the same.
>
> Yes, AFAIK there were no bugs related to this; The test was proposed
> to prevent accidental future bugs.
>
> BACKGROUND
>
> Another pending feature thread (replication of generated columns) [1]
> required many test combinations to confirm all the different expected
> results which are otherwise easily accidentally broken without
> noticing. This *current* thread test shares one of the same error
> messages, which is how it was discovered missing in the first place.
>
> ~~~
>
> PROPOSAL
>
> I think this is not the first time a logical replication test has been
> questioned due mostly to concern about creeping "costs".
>
> How about we create a new test file and put test cases like this one
> into it, guarded by code like the below using PG_TEST_EXTRA [2]?
>
> Doing it this way we can have better code coverage and higher
> confidence when we want it, but zero test cost overheads when we don't
> want it.
>
> e.g.
>
> src/test/subscription/t/101_extra.pl:
>
> if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bsubscription\b/)
> {
> plan skip_all =>
>   'Due to execution costs these tests are skipped unless subscription
> is enabled in PG_TEST_EXTRA';
> }
>
> # Add tests here...
>

To help strengthen the above proposal, here are a couple of examples
where TAP tests already use this strategy to avoid tests for various
reasons.

[1] Avoids some test because of cost
# WAL consistency checking is resource intensive so require opt-in with the
# PG_TEST_EXTRA environment variable.
if (   $ENV{PG_TEST_EXTRA}
&& $ENV{PG_TEST_EXTRA} =~ m/\bwal_consistency_checking\b/)
{
$node_primary->append_conf('postgresql.conf',
'wal_consistency_checking = all');
}

[2] Avoids some tests because of safety
if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/)
{
plan skip_all =>
  'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA';
}

==
[1] 
https://github.com/postgres/postgres/blob/master/src/test/recovery/t/027_stream_regress.pl
[2] 
https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/t/004_load_balance_dns.pl

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Conflict detection and logging in logical replication

2024-08-21 Thread Peter Smith

HI Hous-San,. Here is my review of the v20-0001 docs patch.

1. Restructure into sections

> I think that's a good idea. But I preferred to do that in a separate
> patch(maybe a third patch after the first and second are RFC), because AFAICS
> we would need to adjust some existing docs which falls outside the scope of
> the current patch.

OK. I thought deferring it would only make extra work/churn, given you
already know up-front that everything will require restructuring later
anyway.

~~~

2. Synopsis

2.1 synopsis wrapping.

> I thought about this, but wrapping the sentence would cause the words
> to be displayed in different lines after compiling. I think that's 
> inconsistent
> with the real log which display the tuples in one line.

IMO the readability of the format is the most important objective for
the documentation. And, as you told Shveta, there is already a real
example where people can see the newlines if they want to.

nit - Anyway, FYI there is are newline rendering problems here in v20.
Removed newlines to make all these optional parts appear on the same
line.

2.2 other stuff

nit - Add underscore to /detailed explanation/detailed_explanation/,
to make it more obvious this is a replacement parameter

nit - Added newline after  for readabilty of the SGML file.

~~~

3. Case of literals

It's not apparent to me why the optional "Key" part should be
uppercase in the LOG but other (equally important?) literals of other
parts like "replica identity" are not.

It seems inconsistent.

~~~

4. LOG parts

nit - IMO the "schema.tablename" and the "conflict_type" deserved to
have separate listitems.

nit - The "conflict_type" should have  markup.

~~~

5. DETAIL parts

nit - added newline above this  for readability of the SGML.

nit - Add underscore to detailed_explanation, and rearrange wording to
name the parameter up-front same as the other bullets do.

nit - change the case /key/Key/ to match the synopsis.

~~~

6.
+
+ The replica identity section includes the replica
+ identity key values that used to search for the existing
local tuple to
+ be updated or deleted. This may include the full tuple value
if the local
+ relation is marked with REPLICA IDENTITY FULL.
+

It might be good to also provide a link for that REPLICA IDENTITY
FULL. (I did this already in the attachment as an example)

~~~

7. Replacement parameters - column_name, column_value

I've included these for completeness. I think it is useful.

BTW, the column names seem sometimes optional but I did not know the
rules. It should be documented what makes these names be shown or not
shown.

~~~

Please see the attachment which implements most of the items mentioned above.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/logical-replication.sgml 
b/doc/src/sgml/logical-replication.sgml
index 3df791a..a3a0eae 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1670,11 +1670,10 @@ test_sub=# SELECT * FROM t1 ORDER BY id;
The log format for logical replication conflicts is as follows:
 
 LOG:  conflict detected on relation 
"schemaname.tablename": 
conflict=conflict_type
-DETAIL:  detailed explanation.
-
-Key (column_name, 
...)=(column_value, ...)
-; existing local tuple 
(column_name, 
...)=(column_value, 
...); remote tuple 
(column_name, 
...)=(column_value, 
...); replica identity 
{(column_name, 
...)=(column_value, ...) | full 
(column_value, ...)}.
+DETAIL:  detailed_explanation.
+Key (column_name, 
...)=(column_value, ...); 
existing local tuple 
(column_name, 
...)=(column_value, 
...); remote tuple 
(column_name, 
...)=(column_value, 
...); replica identity 
{(column_name, 
...)=(column_value, ...) | full 
(column_value, ...)}.
 
+
The log provides the following information:

 
@@ -1683,28 +1682,34 @@ DETAIL:  detailed 
explanation.

 
  
- The name of the local relation involved in the conflict and the 
conflict
- type (e.g., insert_exists,
- update_exists).
+ 
schemaname.tablename
+ identifies the local relation involved in the conflict.
+ 
+
+
+ 
+ conflict_type is the type of conflict that 
occurred
+ (e.g., insert_exists, 
update_exists).
  
 

   
 
+
 
  DETAIL
   
   

 
- The origin, transaction ID, and commit timestamp of the transaction 
that
- modified the existing local tuple, if available, are included in the
- detailed explanation.
+ detailed_explanation 
includes
+ the origin, transaction ID, and commit timestamp of the transaction 
that
+ modified the existing local tuple, if available.
 


 
- The key secti

Re: CREATE SUBSCRIPTION - add missing test case

2024-08-21 Thread Peter Smith

On Wed, Aug 21, 2024 at 8:48 PM Amit Kapila  wrote:
>
> On Fri, Aug 16, 2024 at 9:45 AM vignesh C  wrote:
> >
> > On Thu, 15 Aug 2024 at 12:55, Peter Smith  wrote:
> > >
> > > Hi Hackers,
> > >
> > > While reviewing another logical replication thread [1], I found an
> > > ERROR scenario that seems to be untested.
> > >
> > > TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is
> > > missing some expected column(s).
> > >
> > > Attached is a patch to add the missing test for this error message.
> >
> > I agree currently there is no test to hit this code.
> >
>
> I also don't see a test for this error condition. However, it is not
> clear to me how important is it to cover this error code path. This
> code has existed for a long time and I didn't notice any bugs related
> to this. There is a possibility that in the future we might break
> something because of a lack of this test but not sure if we want to
> cover every code path via tests as each additional test also has some
> cost. OTOH, If others think it is important or a good idea to have
> this test then I don't have any objection to the same.

Yes, AFAIK there were no bugs related to this; The test was proposed
to prevent accidental future bugs.

BACKGROUND

Another pending feature thread (replication of generated columns) [1]
required many test combinations to confirm all the different expected
results which are otherwise easily accidentally broken without
noticing. This *current* thread test shares one of the same error
messages, which is how it was discovered missing in the first place.

~~~

PROPOSAL

I think this is not the first time a logical replication test has been
questioned due mostly to concern about creeping "costs".

How about we create a new test file and put test cases like this one
into it, guarded by code like the below using PG_TEST_EXTRA [2]?

Doing it this way we can have better code coverage and higher
confidence when we want it, but zero test cost overheads when we don't
want it.

e.g.

src/test/subscription/t/101_extra.pl:

if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bsubscription\b/)
{
plan skip_all =>
  'Due to execution costs these tests are skipped unless subscription
is enabled in PG_TEST_EXTRA';
}

# Add tests here...

==
[1] 
https://www.postgresql.org/message-id/flat/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E%40gmail.com
[2] https://www.postgresql.org/docs/devel/regress-run.html#REGRESS-ADDITIONAL

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Conflict detection and logging in logical replication

2024-08-20 Thread Peter Smith

Here are some review comments for the v19-0001 docs patch.

The content seemed reasonable, but IMO it should be presented quite differently.



1. Use sub-sections

I expect this logical replication "Conflicts" section is going to
evolve into something much bigger. Surely, it's not going to be one
humongous page of details, so it will be a section with lots of
subsections like all the other in Chapter 29.

IMO, you should be writing the docs in that kind of structure from the
beginning.

For example, I'm thinking something like below (this is just an
example - surely lots more subsections will be needed for this topic):

29.6 Conflicts
29.6.1. Conflict types
29.6.2. Logging format
29.6.3. Examples

Specifically, this v19-0001 patch information should be put into a
subsection like the 29.6.2 shown above.

~~~

2. Markup

+
+LOG:  conflict detected on relation "schemaname.tablename":
conflict=conflict_type
+DETAIL:  detailed explaination.
+Key (column_name, ...)=(column_name, ...);
existing local tuple (column_name,
...)=(column_name, ...); remote tuple (column_name,
...)=(column_name, ...); replica identity
(column_name, ...)=(column_name, ...).
+

IMO this should be using markup more like the SQL syntax references.
- e.g. I suggest  instead of 
- e.g. I suggest all the substitution parameters (e.g. detailed
explanation, conflict_type, column_name, ...) in the log should use
 and use those markups again later in
these docs instead of 

~

nit - typo /explaination/explanation/

~

nit - The amount of scrolling needed makes this LOG format too hard to
see. Try to wrap it better so it can fit without being so wide.

~~~

3. Restructure the list

+   
+

I suggest restructuring all this to use a nested list like:

LOG
- conflict_type
DETAIL
- detailed_explanation
- key
- existing_local_tuple
- remote_tuple
- replica_identity

Doing this means you can remove a great deal of the unnecessary junk
words like "of the first sentence in the DETAIL", and "sentence of the
DETAIL line" etc. The result will be much less text but much simpler
text too.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1192 matches

Mail list logo