Hi, Amit (adding hackers for discussion) >We forgot to set/unset the flag in functions >systable_beginscan_ordered and systable_endscan_ordered. BTW,
Thank you for the clarification. >shouldn't this occur even without prepare transaction? If so, we need >to backpatch this till 14. Yes, it occurred also at PG14. I did some tests using 015_stream.pl with some modification like below, which tests the subscription about stream is on but two-phase is off. The same issue occurred at both current head source code and PG14. ``` --- a/src/test/subscription/t/015_stream.pl +++ b/src/test/subscription/t/015_stream.pl @@ -134,9 +134,11 @@ my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber'); $node_subscriber->init; $node_subscriber->start; +my $default = join('', map {chr(65 + rand 26)} (1 .. 10000)); + # Create some preexisting content on publisher $node_publisher->safe_psql('postgres', - "CREATE TABLE test_tab (a int primary key, b bytea)"); + "CREATE TABLE test_tab (a int primary key, b bytea, t text DEFAULT '$default')"); $node_publisher->safe_psql('postgres', "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')"); @@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE TABLE test_tab_2 (a int)"); # Setup structure on subscriber $node_subscriber->safe_psql('postgres', - "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz DEFAULT now(), d bigint DEFAULT 999)" + "CREATE TABLE test_tab (a int primary key, b bytea, t text DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT 999)" ); ``` Logs from head source: ``` 2024-10-01 12:34:56.694 UTC [575202] LOG: starting PostgreSQL 18devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), 64-bit ... 2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement: BEGIN; 2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement: INSERT INTO test_tab SELECT i, sha256(i::text::bytea) FROM generate_series(3, 5000) s(i); 2024-10-01 12:34:57.524 UTC [575242] tap_sub ERROR: unexpected table_index_fetch_tuple call during logical decoding 2024-10-01 12:34:57.524 UTC [575242] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', streaming 'on', origin 'any', publication_names '"tap_pub"') 2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: released logical replication slot "tap_sub" 2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: could not send data to client: Broken pipe 2024-10-01 12:34:57.525 UTC [575242] tap_sub FATAL: connection to client lost ... 2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', streaming 'on', origin 'any', publication_names '"tap_pub"') 2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: starting logical decoding for slot "tap_sub" 2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: Streaming transactions committing after 0/1583A68, reading WAL from 0/1583A30. 2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', streaming 'on', origin 'any', publication_names '"tap_pub"') 2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: logical decoding found consistent point at 0/1583A30 2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: There are no running transactions. 2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', streaming 'on', origin 'any', publication_names '"tap_pub"') 2024-10-01 12:34:57.831 UTC [575260] tap_sub ERROR: unexpected table_index_fetch_tuple call during logical decoding 2024-10-01 12:34:57.831 UTC [575260] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', streaming 'on', origin 'any', publication_names '"tap_pub"') 2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: released logical replication slot "tap_sub" 2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: could not send data to client: Broken pipe 2024-10-01 12:34:57.832 UTC [575260] tap_sub FATAL: connection to client lost ``` Logs from PG14 source: ``` 2024-10-01 13:20:29.409 UTC [593696] LOG: starting PostgreSQL 14.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), 64-bit ... 2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement: BEGIN; 2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement: INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i); 2024-10-01 13:20:31.301 UTC [593740] tap_sub ERROR: unexpected table_index_fetch_tuple call during logical decoding 2024-10-01 13:20:31.301 UTC [593740] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', streaming 'on', publication_names '"tap_pub"') 2024-10-01 13:20:31.302 UTC [593740] tap_sub LOG: could not send data to client: Broken pipe 2024-10-01 13:20:31.302 UTC [593740] tap_sub FATAL: connection to client lost ... 2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', streaming 'on', publication_names '"tap_pub"') 2024-10-01 13:20:31.830 UTC [593756] tap_sub LOG: logical decoding found consistent point at 0/172C758 2024-10-01 13:20:31.830 UTC [593756] tap_sub DETAIL: There are no running transactions. 2024-10-01 13:20:31.830 UTC [593756] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', streaming 'on', publication_names '"tap_pub"') 2024-10-01 13:20:31.834 UTC [593756] tap_sub ERROR: unexpected table_index_fetch_tuple call during logical decoding 2024-10-01 13:20:31.834 UTC [593756] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', streaming 'on', publication_names '"tap_pub"') 2024-10-01 13:20:31.835 UTC [593756] tap_sub LOG: could not send data to client: Broken pipe 2024-10-01 13:20:31.835 UTC [593756] tap_sub FATAL: connection to client lost ``` >Also, it is better to have a test for this, and let's ensure that the >new test doesn't increase the regression time too much if possible. Sure. I'm going to add some test codes in a few days. > The indentation in the first comment line seems off. Opps, thank you. Fixed. regards, Takeshi Ideriha
From 2a4116ebf5f6b838dd3df64c78322c4e70375b6f Mon Sep 17 00:00:00 2001 From: Ideriha Takeshi <iderihatakeshi@gmail.com> Date: Mon, 30 Sep 2024 00:24:13 +0000 Subject: [PATCH] Set and unset bsysscan flag in systable_*_ordered. We call HandleConcurrentAbort in systable_getnext_ordered so that we can handle concurrent abort case in logical decoding same as systable_getnext. Set and unset bsysscan flag is forgotten. --- src/backend/access/index/genam.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c index 52fde5cc4d..2ecc3e8b31 100644 --- a/src/backend/access/index/genam.c +++ b/src/backend/access/index/genam.c @@ -713,6 +713,14 @@ systable_beginscan_ordered(Relation heapRelation, index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0); sysscan->scan = NULL; + /* + * If CheckXidAlive is set then set a flag to indicate that system table + * scan is in-progress. See detailed comments in xact.c where these + * variables are declared. + */ + if (TransactionIdIsValid(CheckXidAlive)) + bsysscan = true; + return sysscan; } @@ -757,6 +765,14 @@ systable_endscan_ordered(SysScanDesc sysscan) index_endscan(sysscan->iscan); if (sysscan->snapshot) UnregisterSnapshot(sysscan->snapshot); + + /* + * Reset the bsysscan flag at the end of the systable scan. See detailed + * comments in xact.c where these variables are declared. + */ + if (TransactionIdIsValid(CheckXidAlive)) + bsysscan = false; + pfree(sysscan); } -- 2.40.1