In particular:
exec_bind_message()
PushActiveSnapshot(GetTransactionSnapshot());
Suppressing this I've achieved over 1.9 M TXN's a second on select only pgbench
on a 48 core box. It is about 50% faster with this change. The cpu usage of
GetSnapshotData drops from about 22% to 4.5%.
If there were no input functions, that needed this, nor reparsing or
reanalyzing needed, and we knew this up front, it'd be a huge win. We could
test for a number of conditions on the first parse/optimization of the query
and set a flag to suppress this for subsequent executions.
NOTE:
In GetSnapshotData because pgxact, is declared volatile, the compiler will not
reduce the following two IF tests into a single test:
if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
continue;
if (pgxact->vacuumFlags & PROC_IN_VACUUM)
continue;
You can reduce the code path in the inner loop by coding this as:
if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING|PROC_IN_VACUUM))
continue;
I'm still working on quantifying any gain. Note it isn't just one L1 cache
fetch and one conditional branch eliminated. Due to the update frequency of
the pgxact cache line, for single statement TXN's, there are a certain number
of full cache misses, due to invalidation, that occurs when given pgxact is
updated between the first fetch of vacuumFlags and the 2nd fetch.