I have a hot_standby system and use it to bear the load of various reporting queries that take 15-60 minutes each. In an effort to avoid long pauses in recovery, I set a vacuum_defer_cleanup_age constituting roughly three hours of the master's transactions. Even so, I kept seeing recovery pause for the duration of a long-running query. In each case, the culprit record was an XLOG_BTREE_DELETE arising from on-the-fly deletion of an index tuple. The attached test script demonstrates the behavior (on HEAD); the index tuple reclamation conflicts with a concurrent "SELECT pg_sleep(600)" on the standby.
Since this inserting transaction aborts, HeapTupleSatisfiesVacuum reports HEAPTUPLE_DEAD independent of vacuum_defer_cleanup_age. We go ahead and remove the index tuples. On the standby, btree_xlog_delete_get_latestRemovedXid does not regard the inserting-transaction outcome, so btree_redo proceeds to conflict with snapshots having visibility over that transaction. Could we correctly improve this by teaching btree_xlog_delete_get_latestRemovedXid to ignore tuples of aborted transactions and tuples inserted and deleted within one transaction? Thanks, nm
repro-btree-cleanup.sh
Description: Bourne shell script
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers