On 04.09.2020 21:53, Andres Freund wrote:

I also used huge_pages=on / configured them on the OS level. Otherwise
TLB misses will be a significant factor.

As far as I understand there should not be no any TLB misses because size of the shared buffers (8Mb) as several order of magnitude smaler that available physical memory.

Does it change if you initialize the test database using
PGOPTIONS='-c vacuum_freeze_min_age=0' pgbench -i -s 100
or run a manual VACUUM FREEZE; after initialization?
I tried it, but didn't see any improvement.


Hm, it'd probably be good to compare commits closer to the changes, to
avoid other changes showing up.

Hm - did you verify if all the connections were actually established?
Particularly without the patch applied? With an unmodified pgbench, I
sometimes saw better numbers, but only because only half the connections
were able to be established, due to ProcArrayLock contention.
Yes, that really happen quite often at IBM Power2 server (specific of it's atomic implementation). I even have to patch pgbench  by adding one second delay after connection has been established to make it possible  for all clients to connect. But at Intel server I didn't see unconnected clients. And in any case - it happen only for large number of connections (> 1000). But the best performance was achieved at about 100 connections and still I can not reach 2k TPS performance a in your case.

Did you connect via tcp or unix socket? Was pgbench running on the same
machine? It was locally via unix socket for me (but it's also observable
via two machines, just with lower overall throughput).

Pgbench was launched at the same machine and connected through unix sockets.

Did you run a profile to see where the bottleneck is?
Sorry I do not have root privileges at this server and so can not use perf.

There's a seperate benchmark that I found to be quite revealing that's
far less dependent on scheduler behaviour. Run two pgbench instances:

1) With a very simply script '\sleep 1s' or such, and many connections
    (e.g. 100,1000,5000). That's to simulate connections that are
    currently idle.
2) With a normal pgbench read only script, and low client counts.

Before the changes 2) shows a very sharp decline in performance when the
count in 1) increases. Afterwards its pretty much linear.

I think this benchmark actually is much more real world oriented - due
to latency and client side overheads it's very normal to have a large
fraction of connections idle in read mostly OLTP workloads.

Here's the result on my workstation (2x Xeon Gold 5215 CPUs), testing
1f42d35a1d6144a23602b2c0bc7f97f3046cf890 against
07f32fcd23ac81898ed47f88beb569c631a2f223 which are the commits pre/post
connection scalability changes.

I used fairly short pgbench runs (15s), and the numbers are the best of
three runs. I also had emacs and mutt open - some noise to be
expected. But I also gotta work ;)

| Idle Connections | Active Connections | TPS pre | TPS post |
|-----------------:|-------------------:|--------:|---------:|
|                0 |                  1 |   33599 |    33406 |
|              100 |                  1 |   31088 |    33279 |
|             1000 |                  1 |   29377 |    33434 |
|             2500 |                  1 |   27050 |    33149 |
|             5000 |                  1 |   21895 |    33903 |
|            10000 |                  1 |   16034 |    33140 |
|                0 |                 48 | 1042005 |  1125104 |
|              100 |                 48 |  986731 |  1103584 |
|             1000 |                 48 |  854230 |  1119043 |
|             2500 |                 48 |  716624 |  1119353 |
|             5000 |                 48 |  553657 |  1119476 |
|            10000 |                 48 |  369845 |  1115740 |

Yes, there is also noticeable difference in my case

| Idle Connections | Active Connections | TPS pre | TPS post |
|-----------------:|-------------------:|--------:|---------:|
|             5000 |                 48 |  758914 |  1184085 |

Think we'll need profiles to know...

I will try to obtain sudo permissions and do profiling.


Reply via email to