This issue has been reported in the <pgsql-bugs> list at the below link, but received almost no response: https://www.postgresql.org/message-id/18280-4c8060178cb41750%40postgresql.org Hoping for some feedback from kernel hackers, thanks!
Hi, hackers, I've encountered a problem with logical decoding history snapshots. The specific error message is: "ERROR: could not map filenode "base/5/16390" to relation OID". If a subtransaction that modified the catalog ends before the restart_lsn of the logical replication slot, and the commit WAL record of its top transaction is after the restart_lsn, the WAL record related to the subtransaction won't be decoded during logical decoding. Therefore, the subtransaction won't be marked as having modified the catalog, resulting in its absence from the snapshot's committed list. The issue seems to be caused by SnapBuildXidSetCatalogChanges (introduced in 272248a) skipping checks for subtransactions when the top transaction is marked as containing catalog changes. The following steps can reproduce the problem (I increased the value of LOG_SNAPSHOT_INTERVAL_MS to avoid the impact of bgwriter writing XLOG_RUNNING_XACTS WAL records): session 1: ``` CREATE TABLE tbl1 (val1 integer, val2 integer); CREATE TABLE tbl1_part (val1 integer) PARTITION BY RANGE (val1); SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); BEGIN; SAVEPOINT sp1; CREATE TABLE tbl1_part_p1 PARTITION OF tbl1_part FOR VALUES FROM (0) TO (10); RELEASE SAVEPOINT sp1; ``` session 2: ``` CHECKPOINT; ``` session 1: ``` CREATE TABLE tbl1_part_p2 PARTITION OF tbl1_part FOR VALUES FROM (10) TO (20); COMMIT; BEGIN; TRUNCATE tbl1; ``` session 2: ``` CHECKPOINT; SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); INSERT INTO tbl1_part VALUES (1); SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); ``` To fix this issue, it is sufficient to remove the condition check for ReorderBufferXidHasCatalogChanges in SnapBuildXidSetCatalogChanges. This fix may add subtransactions that didn't change the catalog to the commit list, which seems like a false positive. However, this is acceptable since we only use the snapshot built during decoding to read system catalogs, as stated in 272248a's commit message. I have verified that the patch in the attachment resolves the issues mentioned?? and I added some test cases. I am eager to hear your suggestions on this! Best Regards, Fei Changhong Alibaba Cloud Computing Ltd.
fix_wrong_snapshot_for_logical_decoding.patch
Description: Binary data