Hi,

On 4/11/23 10:20 AM, Drouvot, Bertrand wrote:
Hi,

On 4/11/23 7:36 AM, Noah Misch wrote:
On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:
--- /dev/null
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -0,0 +1,720 @@
+# logical decoding on standby : test logical decoding,
+# recovery conflict and standby promotion.
...
+$node_primary->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+});

Buildfarm member hoverfly stopped reporting in when this test joined the tree.
It's currently been stuck here for 140 minutes:


Thanks for the report!

It's looping on:

2023-04-11 02:57:52.516 UTC [62718288:5] 035_standby_logical_decoding.pl LOG:  
00000: statement: SELECT restart_lsn IS NOT NULL
                         FROM pg_catalog.pg_replication_slots WHERE slot_name = 
'promotion_inactiveslot'

And the reason is that the slot is not being created:

$ grep "CREATE_REPLICATION_SLOT" 035_standby_logical_decoding_standby.log | 
tail -2
2023-04-11 02:57:47.287 UTC [9241178:15] 035_standby_logical_decoding.pl STATEMENT:  
CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 
'nothing')
2023-04-11 02:57:47.622 UTC [9241178:23] 035_standby_logical_decoding.pl STATEMENT:  
CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 
'nothing')

Not sure why the slot is not being created.

There is also "replication apply delay" increasing:

2023-04-11 02:57:49.183 UTC [13304488:253] DEBUG:  00000: sendtime 2023-04-11 
02:57:49.111363+00 receipttime 2023-04-11 02:57:49.183512+00 replication apply 
delay 644 ms transfer latency 73 ms
2023-04-11 02:57:49.184 UTC [13304488:259] DEBUG:  00000: sendtime 2023-04-11 
02:57:49.183461+00 receipttime 2023-04-11 02:57:49.1842+00 replication apply 
delay 645 ms transfer latency 1 ms
2023-04-11 02:57:49.221 UTC [13304488:265] DEBUG:  00000: sendtime 2023-04-11 
02:57:49.184166+00 receipttime 2023-04-11 02:57:49.221059+00 replication apply 
delay 682 ms transfer latency 37 ms
2023-04-11 02:57:49.222 UTC [13304488:271] DEBUG:  00000: sendtime 2023-04-11 
02:57:49.221003+00 receipttime 2023-04-11 02:57:49.222144+00 replication apply 
delay 683 ms transfer latency 2 ms
2023-04-11 02:57:49.222 UTC [13304488:277] DEBUG:  00000: sendtime 2023-04-11 
02:57:49.222095+00 receipttime 2023-04-11 02:57:49.2228+00 replication apply 
delay 684 ms transfer latency 1 ms

Noah, I think hoverfly is yours, would it be possible to have access (I'm not 
an AIX expert though) or check if you see a slot creation hanging and if so why?


Well, we can see in 035_standby_logical_decoding_standby.log:

2023-04-11 02:57:49.180 UTC [62718258:5] [unknown] FATAL:  3D000: database 
"testdb" does not exist

While, on the primary:

2023-04-11 02:57:48.505 UTC [62718254:5] 035_standby_logical_decoding.pl LOG:  
00000: statement: CREATE DATABASE testdb

The TAP test is doing:

"
##################################################
# Test standby promotion and logical decoding behavior
# after the standby gets promoted.
##################################################

$node_standby->reload;

$node_primary->psql('postgres', q[CREATE DATABASE testdb]);
$node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y 
text);]);

# create the logical slots
create_logical_slots($node_standby, 'promotion_');
"

I think we might want to add:

$node_primary->wait_for_replay_catchup($node_standby);

before calling the slot creation.

It's done in the attached, would it be possible to give it a try please?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index ba98a18bd2..ad845aee28 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -653,6 +653,9 @@ $node_standby->reload;
 $node_primary->psql('postgres', q[CREATE DATABASE testdb]);
 $node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y 
text);]);
 
+# Wait for the standby to catchup before creating the slots
+$node_primary->wait_for_replay_catchup($node_standby);
+
 # create the logical slots
 create_logical_slots($node_standby, 'promotion_');
 

Reply via email to