Hi hackers, In pg_stat_statements there is entry's mutex spinlock which is used to be able to modify counters without holding pgss->lock exclusively.
Here is a comment from the top of pg_stat_statements.c: * Note about locking issues: to create or delete an entry in the shared * hashtable, one must hold pgss->lock exclusively. Modifying any field * in an entry except the counters requires the same. To look up an entry, * one must hold the lock shared. To read or update the counters within * an entry, one must hold the lock shared or exclusive (so the entry doesn't * disappear!) and also take the entry's mutex spinlock. * The shared state variable pgss->extent (the next free spot in the external * query-text file) should be accessed only while holding either the * pgss->mutex spinlock, or exclusive lock on pgss->lock. We use the mutex to * allow reserving file space while holding only shared lock on pgss->lock. * Rewriting the entire external query-text file, eg for garbage collection, * requires holding pgss->lock exclusively; this allows individual entries * in the file to be read or written while holding only shared lock. And a comment from pgssEntry declaration: slock_t mutex; /* protects the counters only */ Both comments say that spinlocking mutex should be used to protect counters only. But in pg_stat_statements_internal we do read entry->stats_since and entry->minmax_stats_since holding the entry's mutex spinlock. This is unnecessary since we only modify stats_since and minmax_stats_since while holding pgss->lock exclusively, and in pg_stat_statements_internal we are holding pgss->lock when reading them. So even without holding the entry's mutex spinlock there should be no race. I suggest eliminating holding the excessive lock. See the attached patch. This would also restore the consistency between the code and the comments about entry's mutex spinlock usage. Best regards, Karina Litskevich Postgres Professional: http://postgrespro.com/
From c817fc9373ca08488088414a052eaf77dbb55bde Mon Sep 17 00:00:00 2001 From: Karina Litskevich <litskevichkar...@gmail.com> Date: Tue, 5 Nov 2024 20:27:25 +0300 Subject: [PATCH v1] pg_stat_statements: Avoid excessive lock holding Entry's mutex spinlock is used to be able to modify counters without holding pgss->lock exclusively (though holding it at least shared so the entry doesn't disappear). We never actually modify stats_since and minmax_stats_since without holding pgss->lock exclusively, so there is no need to hold entry's mutex spinlock when reading stats_since and minmax_stats_since. This also restores the consistency between the code and the comments about entry's mutex spinlock usage. --- contrib/pg_stat_statements/pg_stat_statements.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 1798e1d016..a66ad99a37 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -1869,9 +1869,16 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo, /* copy counters to a local variable to keep locking time short */ SpinLockAcquire(&entry->mutex); tmp = entry->counters; + SpinLockRelease(&entry->mutex); + + /* + * There is no need to hold entry->mutex when reading stats_since and + * minmax_stats_since for (unlike counters) they are always written + * while holding pgss->lock exclusively. We are holding pgss->lock + * shared so there should be no race here. + */ stats_since = entry->stats_since; minmax_stats_since = entry->minmax_stats_since; - SpinLockRelease(&entry->mutex); /* Skip entry if unexecuted (ie, it's a pending "sticky" entry) */ if (IS_STICKY(tmp)) -- 2.34.1