> All the changes mentioned above are included in the v13 patch. Since the
> patch status is 'Ready for Committer,' I believe it is now better for
> upstream inclusion, with improved details in tests and documentation. Do
> you have any further suggestions?

I am not quite clear on the sample_1.out. I do like the idea of separating
the sample tests, but I was thinking of something a bit more simple.
What do you think of my attached, sampling.sql, test? It tests sample
rate in both
simple and extended query protocols and for both top level and
nested levels?

> If anyone has the capability to run this benchmark on machines with more
> CPUs or with different queries, it would be nice. I’d appreciate any
> suggestions or feedback.

I wanted to share some additional benchmarks I ran as well
on a r8g.48xlarge ( 192 vCPUs, 1,536 GiB of memory) configured
with 16GB of shared_buffers. I also attached the benchmark.sh
script used to generate the output.
The benchmark is running the select-only pgbench workload,
so we have a single heavily contentious entry, which is the
worst case.

The test shows that the spinlock (SpinDelay waits)
becomes an issue at high connection counts and will
become worse on larger machines. A sample_rate going from
1 to .75 shows a 60% improvement; but this is on a single
contentious entry. Most workloads will likely not see this type
of improvement. I also could not really observe
this type of difference on smaller machines ( i.e. 32 vCPUs),
as expected.

## init
pgbench -i -s500

### 192 connections
pgbench -c192 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 484338.769799 (without initial connection time)
waits
-----
  11107  SpinDelay
   9568  CPU
    929  ClientRead
     13  DataFileRead
      3  BufferMapping

sample_rate = .75
tps = 909547.562124 (without initial connection time)
waits
-----
  12079  CPU
   4781  SpinDelay
   2100  ClientRead

sample_rate = .5
tps = 1028594.555273 (without initial connection time)
waits
-----
  13253  CPU
   3378  ClientRead
    174  SpinDelay

sample_rate = .25
tps = 1019507.126313 (without initial connection time)
waits
-----
  13397  CPU
   3423  ClientRead

sample_rate = 0
tps = 1015425.288538 (without initial connection time)
waits
-----
  13106  CPU
   3502  ClientRead

### 32 connections
pgbench -c32 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 620667.049565 (without initial connection time)
waits
-----
   1782  CPU
    560  ClientRead

sample_rate = .75
tps = 620663.131347 (without initial connection time)
waits
-----
   1736  CPU
    554  ClientRead

sample_rate = .5
tps = 624094.688239 (without initial connection time)
waits
-----
   1741  CPU
    648  ClientRead

sample_rate = .25
tps = 628638.538204 (without initial connection time)
waits
-----
   1702  CPU
    576  ClientRead

sample_rate = 0
tps = 630483.464912 (without initial connection time)
waits
-----
   1638  CPU
    574  ClientRead

Regards,

Sami

Attachment: sampling.sql
Description: Binary data

Attachment: benchmark.sh
Description: Bourne shell script

Reply via email to