Hello,

To test postgres 11, I still regularly run series of short sessions of pgbench-over-logical-replication (basically the same thing that I used last year [1] - now in a perl incarnation). Most of the time the replication is stable and finishes correctly but sometimes (rarely) I get:

TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(safeXid, snap->xmin))", File: "snapbuild.c", Line: 580)

This will probably be difficult to reproduce and to act upon but I wanted to report it anyway as in the course of the last few months I have seen it several times, on several machines. Always rarely, always postgres 11 (I did not try other versions).


Erik Rijkers


[1] https://www.postgresql.org/message-id/3897361c7010c4ac03f358173adbcd60%40xs4all.nl


source/version: bf2d0462cd73 / REL_11_STABLE  (compiled 20180828_1629)

1 primary, 4 replicas (on 1 host).
Basically: pgbench --port=6515 --quiet --initialize --scale=5 postgres
  then: pgbench -M prepared -c 93 -j 8 -T 10 -P 2 -p 6515 postgres
  then: wait for log-repl sync.
(I also added extra data type columns to the basic pgbench tables to test all the different data types)

Below is a grep for the assert message, with 15 surrounding lines from the latest occurrence.

$ p=REL_11_STABLE; cd /var/data1/pg_stuff/tmp/cascade/$p ; grep -A 15 -B 15 -Ei 'trap|assert' 65*/logfile.65* | less 6516JTq_E8/logfile.6516-2018-08-29 14:28:13.747 CEST [139409] LOG: logical decoding found consistent point at 0/89E038E0 6516JTq_E8/logfile.6516-2018-08-29 14:28:13.747 CEST [139409] DETAIL: There are no running transactions. 6516JTq_E8/logfile.6516-2018-08-29 14:28:14.522 CEST [139539] LOG: received replication command: CREATE_REPLICATION_SLOT "sub2_6517_6517_18834_sync_18804" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.681 CEST [139539] LOG: logical decoding found initial starting point at 0/8AF5D590 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.681 CEST [139539] DETAIL: Waiting for transactions (approximately 1) older than 886297 to end. 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.720 CEST [139539] LOG: logical decoding found consistent point at 0/8B018310 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.720 CEST [139539] DETAIL: There are no running transactions. 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.860 CEST [139407] LOG: received replication command: START_REPLICATION SLOT "sub2_6517_6517_18834_sync_18793" LOGICAL 0/89E03918 (proto_version '1', publication_names '"pub1_6516"') 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.861 CEST [139407] LOG: starting logical decoding for slot "sub2_6517_6517_18834_sync_18793" 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.861 CEST [139407] DETAIL: Streaming transactions committing after 0/89E03918, reading WAL from 0/841EE960. 6516JTq_E8/logfile.6516-2018-08-29 14:28:16.064 CEST [139407] LOG: logical decoding found consistent point at 0/841EF3E0 6516JTq_E8/logfile.6516-2018-08-29 14:28:16.064 CEST [139407] DETAIL: Logical decoding will begin using saved snapshot. 6516JTq_E8/logfile.6516-2018-08-29 14:28:18.953 CEST [139541] LOG: received replication command: CREATE_REPLICATION_SLOT "sub2_6517_6517_18834_sync_18814" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.002 CEST [139541] LOG: logical decoding found consistent point at 0/8C6BA408 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.002 CEST [139541] DETAIL: There are no running transactions. 6516JTq_E8/logfile.6516:TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(safeXid, snap->xmin))", File: "snapbuild.c", Line: 580) 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130398] LOG: server process (PID 139541) was terminated by signal 6: Aborted 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130398] DETAIL: Failed process was running: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130398] LOG: terminating any other active server processes 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [139405] WARNING: terminating connection because of crash of another server process 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [139405] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [139405] HINT: In a moment you should be able to reconnect to the database and repeat your command. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [139405] CONTEXT: slot "sub2_6517_6517", output plugin "pgoutput", in the change callback, associated LSN 0/8B11CF00 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130411] WARNING: terminating connection because of crash of another server process 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130411] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130411] HINT: In a moment you should be able to reconnect to the database and repeat your command. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130405] WARNING: terminating connection because of crash of another server process 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130405] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130405] HINT: In a moment you should be able to reconnect to the database and repeat your command. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.116 CEST [130398] LOG: all server processes terminated; reinitializing 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.515 CEST [139548] LOG: database system was interrupted; last known up at 2018-08-29 14:26:14 CEST




Reply via email to