Hi! One of our customers stumble onto a significant performance degradation while running multiple OLAP-like queries on a replica. After some investigation, it became clear that the problem is in accessing old_snapshot_threshold parameter.
Accessing old_snapshot_threshold parameter is guarded by mutex_threshold. This is not a problem on primary server, since we rarely call GetOldSnapshotThresholdTimestamp: 5028 void 5029 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation) 5030 { 5031 ····if (RelationAllowsEarlyPruning(relation) 5032 ········&& (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp()) 5033 ········ereport(ERROR, 5034 ················(errcode(ERRCODE_SNAPSHOT_TOO_OLD), 5035 ················ errmsg("snapshot too old"))); But in case of a replica, we have to call GetOldSnapshotThresholdTimestamp much often. So, this become a bottleneck. The customer solve this issue by setting old_snapshot_threshold to 0. But, I think, we can do something about it. Some more investigation: -- On primary -- $ ./bin/psql postgres -c "create database benchmark" CREATE DATABASE $ ./bin/pgbench -i -Uorlov -s300 benchmark dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping ... creating tables... generating data (client-side)... 30000000 of 30000000 tuples (100%) done (elapsed 142.37 s, remaining 0.00 s) vacuuming... creating primary keys... done in 177.67 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 144.45 s, vacuum 0.59 s, primary keys 32.61 s). -- On secondary -- $ touch 1.sql $ vim 1.sql $ cat 1.sql \set bid random(1, 300) BEGIN; SELECT sum(aid) FROM pgbench_accounts where bid = :bid GROUP BY bid; END; $ ./bin/pgbench -f 1.sql -p5433 -Uorlov -j10 -c100 -T720 -P1 -n benchmark pgbench (16devel) progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed ... progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed $ perf record -F 99 -a -g --call-graph=dwarf sleep 5 $ perf script --header --fields comm,pid,tid,time,event,ip,sym,dso > file $ grep s_lock file | wc -l 3486 My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA 0001 patch. With patch 0001 we got: $ grep s_lock file2 | wc -l 8 Maybe, we shall go farther and remove mutex_threshold here? This will lead to inconsistency of threshold_timestamp and threshold_xid, but is this really a problem? Thoughts? -- Best regards, Maxim Orlov.
0001-PGPRO-7624-use-atomic-old_snapshot_threshold.patch
Description: Binary data