On 29.01.2025 21:52, Ilia Evdokimov wrote:
... I also attached the benchmark.sh
script used to generate the output.
In my opinion, if we can't observe bottleneck of spinlock on 32 CPUs,
we should determine the CPU count at which it becomes. This will help
us understand the scale of the problem. Does this make sense, or are
there really no real workloads where the same query runs on more than
32 CPUs, and we've been trying to solve a non-existent problem?
I ran the same benchmark on 48 CPUs for -c 48 -j 20 for objectivity.
### 48 connections
pgbench -c48 -j20 -S -Mprepared -T120 --progress 10
sample_rate = 1
tps = 643251.640175 (without initial connection time)
waits
-----
932 ClientRead
911 CPU
44 SpinDelay
sample_rate = .75
tps = 653946.777122 (without initial connection time)
waits
-----
939 CPU
875 ClientRead
3 SpinDelay
sample_rate = .5
tps = 651654.348463 (without initial connection time)
waits
-----
932 ClientRead
841 CPU
sample_rate = .25
tps = 652668.807245 (without initial connection time)
waits
-----
910 ClientRead
860 CPU
sample_rate = 0
tps = 659111.347019 (without initial connection time)
waits
-----
882 ClientRead
849 CPU
There is a small amount ofSpinDelay, as the user mentioned. However, we
can identify the threshold where the problem appears.
To summarize the results of all benchmarks, I compiled them into a table:
CPUs | sample_rate | tps | CPU waits | ClientRead wait | SpinDelay wait
192 | 1.0 | 484338| 9568 | 929 | 11107
192 | 0.75 | 909547| 12079 | 2100 | 4781
192 | 0.5 |1028594| 13253 | 3378 | 174
192 | 0.25 |1019507| 13397 | 3423 | -
192 | 0.0 |1015425| 13106 | 3502 | -
48 | 1.0 | 643251| 911 | 932 | 44
48 | 0.75 | 653946| 939 | 939 | 3
48 | 0.5 | 651654| 841 | 932 | -
48 | 0.25 | 652668| 860 | 910 | -
48 | 0.0 | 659111| 849 | 882 | -
32 | 1.0 | 620667| 1782 | 560 | -
32 | 0.75 | 620667| 1736 | 554 | -
32 | 0.5 | 624094| 1741 | 648 | -
32 | 0.25 | 628638| 1702 | 576 | -
32 | 0.0 | 630483| 1638 | 574 | -
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.