On 09.01.2025 05:29, Sami Imseih wrote:
Unfortunately, these changes do not achieve the intended sampling goal.
I looked into this more deeply: while the sampled-out queries do not
appear in pg_stat_statements, an entry is still allocated in the hash
table after normalization, which, in my view, should not happen when
sampling is in effect. Therefore, patch v9 is unlikely to meet our needs.
pg_stat_statements creates entries as "sticky" initially to give them
more time to stay in the hash before the first execution completes.
It's not perfect, but it works for the majority of cases. So, what you
are observing is how pg_stat_statements currently works.
If an entry is popular enough, we will need it anyways ( even
with the proposed sampling ). An entry that's not popular will
eventually be aged out.
From my understanding, what the proposed sampling will do is
to reduce the overhead of incrementing counters of popular entries,
because of the spinlock to update the counters. This is particularly
the case with high concurrency on large machines ( high cpu count ),
and especially when there is a small set of popular entries.
IMO, This patch should also have a benchmark that proves
that a user can benefit with sampling in those types of
workloads.
Ah, so patch version 9 might be the best fit to achieve this. I’ll need
to benchmark it on a large, high-concurrency machine then.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.