Re: Sample rate added to pg_stat_statements

Ilia Evdokimov Wed, 29 Jan 2025 10:53:01 -0800


On 28.01.2025 23:50, Ilia Evdokimov wrote:

If anyone has the capability to run this benchmark on machines withmore
CPUs or with different queries, it would be nice. I’d appreciate any
suggestions or feedback.

I wanted to share some additional benchmarks I ran as well
on a r8g.48xlarge ( 192 vCPUs, 1,536 GiB of memory) configured
with 16GB of shared_buffers. I also attached the benchmark.sh
script used to generate the output.
The benchmark is running the select-only pgbench workload,
so we have a single heavily contentious entry, which is the
worst case.

The test shows that the spinlock (SpinDelay waits)
becomes an issue at high connection counts and will
become worse on larger machines. A sample_rate going from
1 to .75 shows a 60% improvement; but this is on a single
contentious entry. Most workloads will likely not see this type
of improvement. I also could not really observe
this type of difference on smaller machines ( i.e. 32 vCPUs),
as expected.

## init
pgbench -i -s500

### 192 connections
pgbench -c192 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 484338.769799 (without initial connection time)
waits
-----
   11107  SpinDelay
    9568  CPU
     929  ClientRead
      13  DataFileRead
       3  BufferMapping

sample_rate = .75
tps = 909547.562124 (without initial connection time)
waits
-----
   12079  CPU
    4781  SpinDelay
    2100  ClientRead

sample_rate = .5
tps = 1028594.555273 (without initial connection time)
waits
-----
   13253  CPU
    3378  ClientRead
     174  SpinDelay

sample_rate = .25
tps = 1019507.126313 (without initial connection time)
waits
-----
   13397  CPU
    3423  ClientRead

sample_rate = 0
tps = 1015425.288538 (without initial connection time)
waits
-----
   13106  CPU
    3502  ClientRead

### 32 connections
pgbench -c32 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 620667.049565 (without initial connection time)
waits
-----
    1782  CPU
     560  ClientRead

sample_rate = .75
tps = 620663.131347 (without initial connection time)
waits
-----
    1736  CPU
     554  ClientRead

sample_rate = .5
tps = 624094.688239 (without initial connection time)
waits
-----
    1741  CPU
     648  ClientRead

sample_rate = .25
tps = 628638.538204 (without initial connection time)
waits
-----
    1702  CPU
     576  ClientRead

sample_rate = 0
tps = 630483.464912 (without initial connection time)
waits
-----
    1638  CPU
     574  ClientRead

Regards,

Sami

Thank you so much for benchmarking this on a pretty large machine witha large number of CPUs. The results look fantastic, and I trulyappreciate your effort.

BWT, I realized that the 'sampling' test needs to be added not only tothe Makefile but also to meson.build. I've included that in the v14patch.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

In my opinion, if we can't observe bottleneck of spinlock on 32 CPUs, weshould determine the CPU count at which it becomes. This will help usunderstand the scale of the problem. Does this make sense, or are therereally no real workloads where the same query runs on more than 32 CPUs,and we've been trying to solve a non-existent problem?


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

Reply via email to