On Sat, 19 Aug 2023 at 11:53, Amit Kapila <[email protected]> wrote: > > It's entirely possible for a logical slot to have a confirmed_flush > LSN higher than the last value saved on disk while not being marked as > dirty. It's currently not a problem to lose that value during a clean > shutdown / restart cycle but to support the upgrade of logical slots > [1] (see latest patch at [2]), we seem to rely on that value being > properly persisted to disk. During the upgrade, we need to verify that > all the data prior to shudown_checkpoint for the logical slots has > been consumed, otherwise, the downstream may miss some data. Now, to > ensure the same, we are planning to compare the confirm_flush LSN > location with the latest shudown_checkpoint location which means that > the confirm_flush LSN should be updated after restart. > > I think this is inefficient even without an upgrade because, after the > restart, this may lead to decoding some data again. Say, we process > some transactions for which we didn't send anything downstream (the > changes got filtered) but the confirm_flush LSN is updated due to > keepalives. As we don't flush the latest value of confirm_flush LSN, > it may lead to processing the same changes again.
I was able to test and verify that we were not processing the same changes again. Note: The 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch has logs to print if a decode transaction is skipped and also a log to mention if any operation is filtered. The test.sh script has the steps for a) setting up logical replication for a table b) perform insert on table that need to be published (this will be replicated to the subscriber) c) perform insert on a table that will not be published (this insert will be filtered, it will not be replicated) d) sleep for 5 seconds e) stop the server f) start the server I used the following steps, do the following in HEAD: a) Apply 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch patch in Head and build the binaries b) execute test.sh c) view N1.log file to see that the insert operations were filtered again by seeing the following logs: LOG: Filter insert for table tbl2 ... ===restart=== ... LOG: Skipping transaction 0/156AD10 as start decode at is greater 0/156AE40 ... LOG: Filter insert for table tbl2 We can see that the insert operations on tbl2 which was filtered before server was stopped is again filtered after restart too in HEAD. Lets see that the same changes were not processed again with patch: a) Apply v4-0001-Persist-to-disk-logical-slots-during-a-shutdown-c.patch from [1] also apply 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch patch and build the binaries b) execute test.sh c) view N1.log file to see that the insert operations were skipped after restart of server by seeing the following logs: LOG: Filter insert for table tbl2 ... ===restart=== ... Skipping transaction 0/156AD10 as start decode at is greater 0/156AFB0 ... Skipping transaction 0/156AE80 as start decode at is greater 0/156AFB0 We can see that the insert operations on tbl2 are not processed again after restart with the patch. [1] - https://www.postgresql.org/message-id/CALDaNm0VrAt24e2FxbOX6eJQ-G_tZ0gVpsFBjzQM99NxG0hZfg%40mail.gmail.com Regards, Vignesh
#!/bin/bash
port_publisher=5431
port_subscriber=5432
bindir=/home/vignesh/postgres/inst/bin
echo '##########'
echo '#Clean up#'
echo '##########'
pg_ctl stop -D data_N2
pg_ctl stop -D data_N1
rm -r data_N1 data_N2 *log
echo '########'
echo '#Set up#'
echo '########'
initdb -D data_N1 -U postgres
initdb -D data_N2 -U postgres
cat << EOF >> data_N1/postgresql.conf
wal_level = logical
port = $port_publisher
wal_sender_timeout = 5s
#log_min_messages = debug3
EOF
cat << EOF >> data_N2/postgresql.conf
wal_level = logical
port = $port_subscriber
EOF
# Boot database instances
pg_ctl -D data_N1 start -w -l N1.log
pg_ctl -D data_N2 start -w -l N2.log
# Setup as publisher/subscriber
psql -U postgres -p $port_publisher -c "CREATE TABLE tbl (a int, b int);"
psql -U postgres -p $port_publisher -c "CREATE TABLE tbl2 (a int, b int);"
psql -U postgres -p $port_publisher -c "CREATE EXTENSION pg_walinspect"
psql -U postgres -p $port_publisher -c "CREATE PUBLICATION pub FOR TABLE tbl;"
psql -U postgres -p $port_subscriber -c "CREATE TABLE tbl (a int, b int);"
psql -U postgres -p $port_subscriber -c "CREATE TABLE tbl2 (a int, b int);"
psql -U postgres -p $port_subscriber -c "CREATE SUBSCRIPTION sub CONNECTION 'user=postgres dbname=postgres port=$port_publisher' PUBLICATION pub WITH (copy_data = off)"
# do INSERT on publisher
psql -U postgres -p $port_publisher -c "INSERT INTO tbl VALUES (generate_series(1, 5))"
psql -U postgres -p $port_publisher -c "INSERT INTO tbl2 VALUES (generate_series(1, 5))"
# Wait short time to make sure subscriber is caught up
sleep 5s
psql -U postgres -p $port_subscriber -c "SELECT COUNT(*) FROM tbl;"
# Stop both nodes and reboot to emulate pg_upgrade
pg_ctl stop -D data_N2
pg_ctl stop -D data_N1
pg_ctl -D data_N1 start -w -l N1.log -o "-b"
echo '###############################################################'
echo '#Check difference of WAL position between publisher/subscriber#'
echo '###############################################################'
psql -U postgres -p $port_publisher -c "
WITH tmp as (
SELECT confirmed_flush_lsn FROM pg_replication_slots
)
SELECT row_number() over (), start_lsn, end_lsn, prev_lsn, xid, resource_manager, record_type,
record_length, main_data_length, fpi_length, description, block_ref
FROM tmp, pg_get_wal_records_info(tmp.confirmed_flush_lsn, 'FFFFFFFF/FFFFFFFF');
"
pg_ctl -D data_N2 start -w -l N2.log -o "-b"
0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
Description: Binary data
