Hi, Amit
(adding hackers for discussion)

>We forgot to set/unset the flag in functions
>systable_beginscan_ordered and systable_endscan_ordered. BTW,

Thank you for the clarification.

>shouldn't this occur even without prepare transaction? If so, we need
>to backpatch this till 14.

Yes, it occurred also at PG14.
I did some tests using 015_stream.pl with some modification like
below, which tests the subscription about stream is on but two-phase
is off.
The same issue occurred at both current head source code and PG14.

```
--- a/src/test/subscription/t/015_stream.pl
+++ b/src/test/subscription/t/015_stream.pl
@@ -134,9 +134,11 @@ my $node_subscriber =
PostgreSQL::Test::Cluster->new('subscriber');
 $node_subscriber->init;
 $node_subscriber->start;

+my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
+
 # Create some preexisting content on publisher
 $node_publisher->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea)");
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default')");
 $node_publisher->safe_psql('postgres',
        "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");

@@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE
TABLE test_tab_2 (a int)");

 # Setup structure on subscriber
 $node_subscriber->safe_psql('postgres',
-       "CREATE TABLE test_tab (a int primary key, b bytea, c
timestamptz DEFAULT now(), d bigint DEFAULT 999)"
+       "CREATE TABLE test_tab (a int primary key, b bytea, t text
DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT
999)"
 );
```

Logs from head source:
```
2024-10-01 12:34:56.694 UTC [575202] LOG:  starting PostgreSQL 18devel
on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG:  statement: BEGIN;
2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG:  statement:
INSERT INTO test_tab SELECT i, sha256(i::text::bytea) FROM
generate_series(3, 5000) s(i);
2024-10-01 12:34:57.524 UTC [575242] tap_sub ERROR:  unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.524 UTC [575242] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG:  released logical
replication slot "tap_sub"
2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG:  could not send data
to client: Broken pipe
2024-10-01 12:34:57.525 UTC [575242] tap_sub FATAL:  connection to client lost
...
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG:  starting logical
decoding for slot "tap_sub"
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL:  Streaming
transactions committing after 0/1583A68, reading WAL from 0/1583A30.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG:  logical decoding
found consistent point at 0/1583A30
2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL:  There are no
running transactions.
2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.831 UTC [575260] tap_sub ERROR:  unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 12:34:57.831 UTC [575260] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4',
streaming 'on', origin 'any', publication_names '"tap_pub"')
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG:  released logical
replication slot "tap_sub"
2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG:  could not send data
to client: Broken pipe
2024-10-01 12:34:57.832 UTC [575260] tap_sub FATAL:  connection to client lost
```

Logs from PG14 source:

```
2024-10-01 13:20:29.409 UTC [593696] LOG:  starting PostgreSQL 14.9 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat
11.4.1-2), 64-bit
...
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG:  statement: BEGIN;
2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG:  statement:
INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3,
5000) s(i);
2024-10-01 13:20:31.301 UTC [593740] tap_sub ERROR:  unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.301 UTC [593740] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.302 UTC [593740] tap_sub LOG:  could not send data
to client: Broken pipe
2024-10-01 13:20:31.302 UTC [593740] tap_sub FATAL:  connection to client lost
...
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.830 UTC [593756] tap_sub LOG:  logical decoding
found consistent point at 0/172C758
2024-10-01 13:20:31.830 UTC [593756] tap_sub DETAIL:  There are no
running transactions.
2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.834 UTC [593756] tap_sub ERROR:  unexpected
table_index_fetch_tuple call during logical decoding
2024-10-01 13:20:31.834 UTC [593756] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2',
streaming 'on', publication_names '"tap_pub"')
2024-10-01 13:20:31.835 UTC [593756] tap_sub LOG:  could not send data
to client: Broken pipe
2024-10-01 13:20:31.835 UTC [593756] tap_sub FATAL:  connection to client lost
```

>Also, it is better to have a test for this, and let's ensure that the
>new test doesn't increase the regression time too much if possible.

Sure. I'm going to add some test codes in a few days.

>  The indentation in the first comment line seems off.

Opps, thank you. Fixed.

regards,
Takeshi Ideriha
From 2a4116ebf5f6b838dd3df64c78322c4e70375b6f Mon Sep 17 00:00:00 2001
From: Ideriha Takeshi <iderihatakeshi@gmail.com>
Date: Mon, 30 Sep 2024 00:24:13 +0000
Subject: [PATCH] Set and unset bsysscan flag in systable_*_ordered.

We call HandleConcurrentAbort in systable_getnext_ordered so that
we can handle concurrent abort case in logical decoding same as
systable_getnext. Set and unset bsysscan flag is forgotten.
---
 src/backend/access/index/genam.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 52fde5cc4d..2ecc3e8b31 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -713,6 +713,14 @@ systable_beginscan_ordered(Relation heapRelation,
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
+	/*
+	 * If CheckXidAlive is set then set a flag to indicate that system table
+	 * scan is in-progress.  See detailed comments in xact.c where these
+	 * variables are declared.
+	 */
+	if (TransactionIdIsValid(CheckXidAlive))
+		bsysscan = true;
+	
 	return sysscan;
 }
 
@@ -757,6 +765,14 @@ systable_endscan_ordered(SysScanDesc sysscan)
 	index_endscan(sysscan->iscan);
 	if (sysscan->snapshot)
 		UnregisterSnapshot(sysscan->snapshot);
+
+	/*
+	 * Reset the bsysscan flag at the end of the systable scan.  See detailed
+	 * comments in xact.c where these variables are declared.
+	 */
+	if (TransactionIdIsValid(CheckXidAlive))
+		bsysscan = false;
+	
 	pfree(sysscan);
 }
 
-- 
2.40.1

Reply via email to