On Mon, Jul 21, 2025, at 08:16, Joel Jacobson wrote: > Since there is no point of just doing NOTIFY if nobody is LISTENing, > a realistic benchmark would also need to do LISTEN. > What you will then see is that TPS will be severely impacted, > and the gains from removing the global exclusive lock will > drown in the huge cost for all the syscalls.
I thought I better put my money where my mouth is, and decided to try to replicate Rishu's benchmark results, and in addition, benchmark my own hypothesis above, that if not just doing NOTIFY but both LISTEN+NOTIFY, the 2x improvement would be heavily reduced. The benchmarks I've run below confirm this hypothesis. The observed 2x improvement is reduced to a 0.14x improvement when doing both LISTEN+NOTIFY. Benchmark from original post: > Here are the results on a MacBook Air (Apple M2 chip, 8 cores, 16 GB memory): > > publish_out_of_order_notifications = off: > > • Run 1: 158,190 TPS (latency: 0.051 ms) > • Run 2: 189,771 TPS (latency: 0.042 ms) > • Run 3: 189,401 TPS (latency: 0.042 ms) > • Run 4: 190,288 TPS (latency: 0.042 ms) > • Run 5: 185,001 TPS (latency: 0.043 ms) > > publish_out_of_order_notifications = on: > > • Run 1: 298,982 TPS (latency: 0.027 ms) > • Run 2: 345,162 TPS (latency: 0.023 ms) > • Run 3: 351,309 TPS (latency: 0.023 ms) > • Run 4: 333,035 TPS (latency: 0.024 ms) > • Run 5: 353,834 TPS (latency: 0.023 ms) > > This shows roughly a 2x improvement in TPS in this basic benchmark. # Benchmarks on my MacBook Pro (Apple M3 Max, 16 cores, 128 GB memory) ## NOTIFY only ~160k TPS master (HEAD) ~340k TPS (0001-allow-out-of-order-notifications.patch) => ~125% improvement ### master (HEAD) % cat notify_common.sql NOTIFY channel_common; % for n in `seq 1 5` ; do pgbench -f notify_common.sql -c 8 -t 2000 -n | grep -E '^(latency|tps)' ; done latency average = 0.052 ms tps = 154326.941626 (without initial connection time) latency average = 0.049 ms tps = 162334.368215 (without initial connection time) latency average = 0.050 ms tps = 160703.883008 (without initial connection time) latency average = 0.048 ms tps = 165296.086615 (without initial connection time) latency average = 0.049 ms tps = 163706.310878 (without initial connection time) ### 0001-allow-out-of-order-notifications.patch % cat notify_common.sql NOTIFY channel_common; % for n in `seq 1 5` ; do PGOPTIONS='-c publish_notifications_out_of_order=true' pgbench -f notify_common.sql -c 8 -t 2000 -n | grep -E '^(latency|tps)' ; done latency average = 0.026 ms tps = 310149.647205 (without initial connection time) latency average = 0.021 ms tps = 380427.029340 (without initial connection time) latency average = 0.025 ms tps = 320108.837005 (without initial connection time) latency average = 0.024 ms tps = 333500.083375 (without initial connection time) latency average = 0.022 ms tps = 357965.859006 (without initial connection time) ## LISTEN+NOTIFY #### ~73k TPS master (HEAD) ~83k TPS (0001-allow-out-of-order-notifications.patch) => ~14% improvement ### master (HEAD) % cat listen_notify_common.sql LISTEN channel_common; NOTIFY channel_common; % for n in `seq 1 5` ; do pgbench -f listen_notify_common.sql -c 8 -t 2000 -n | grep -E '^(latency|tps)' ; done latency average = 0.112 ms tps = 71677.201722 (without initial connection time) latency average = 0.109 ms tps = 73228.220325 (without initial connection time) latency average = 0.109 ms tps = 73310.423826 (without initial connection time) latency average = 0.108 ms tps = 73995.625009 (without initial connection time) latency average = 0.113 ms tps = 70970.431944 (without initial connection time) ### 0001-allow-out-of-order-notifications.patch % for n in `seq 1 5` ; do PGOPTIONS='-c publish_notifications_out_of_order=true' pgbench -f listen_notify_common.sql -c 8 -t 2000 -n | grep -E '^(latency|tps)' ; done latency average = 0.098 ms tps = 81620.992919 (without initial connection time) latency average = 0.095 ms tps = 84173.755675 (without initial connection time) latency average = 0.096 ms tps = 83634.329802 (without initial connection time) latency average = 0.095 ms tps = 84311.700356 (without initial connection time) latency average = 0.096 ms tps = 83340.278357 (without initial connection time) For a normal PostgreSQL with the CPU and storage on the same physical machine, I think the results above clearly demonstrate that the global exclusive lock is at least not the bottleneck, which I strongly believe instead is the flood of unnecessary kill(pid, SIGUSR1) syscalls. If anyone with access to a cloud environment, with compute and storage separated, like suggested by Rishu, it would be interesting to see what benchmark results you get. /Joel