On Mon, Jul 21, 2025, at 08:16, Joel Jacobson wrote:
> Since there is no point of just doing NOTIFY if nobody is LISTENing,
> a realistic benchmark would also need to do LISTEN.
> What you will then see is that TPS will be severely impacted,
> and the gains from removing the global exclusive lock will
> drown in the huge cost for all the syscalls.

I thought I better put my money where my mouth is,
and decided to try to replicate Rishu's benchmark results,
and in addition, benchmark my own hypothesis above,
that if not just doing NOTIFY but both LISTEN+NOTIFY,
the 2x improvement would be heavily reduced.

The benchmarks I've run below confirm this hypothesis.
The observed 2x improvement is reduced to a 0.14x
improvement when doing both LISTEN+NOTIFY.

Benchmark from original post:

> Here are the results on a MacBook Air (Apple M2 chip, 8 cores, 16 GB memory):
>
> publish_out_of_order_notifications = off:
>
> • Run 1: 158,190 TPS (latency: 0.051 ms)
> • Run 2: 189,771 TPS (latency: 0.042 ms)
> • Run 3: 189,401 TPS (latency: 0.042 ms)
> • Run 4: 190,288 TPS (latency: 0.042 ms)
> • Run 5: 185,001 TPS (latency: 0.043 ms)
>
> publish_out_of_order_notifications = on:
>
> • Run 1: 298,982 TPS (latency: 0.027 ms)
> • Run 2: 345,162 TPS (latency: 0.023 ms)
> • Run 3: 351,309 TPS (latency: 0.023 ms)
> • Run 4: 333,035 TPS (latency: 0.024 ms)
> • Run 5: 353,834 TPS (latency: 0.023 ms)
>
> This shows roughly a 2x improvement in TPS in this basic benchmark.

# Benchmarks on my MacBook Pro (Apple M3 Max, 16 cores, 128 GB memory)

## NOTIFY only

~160k TPS master (HEAD)
~340k TPS (0001-allow-out-of-order-notifications.patch)
=> ~125% improvement

### master (HEAD)

% cat notify_common.sql
NOTIFY channel_common;

% for n in `seq 1 5` ; do pgbench -f notify_common.sql -c 8 -t 2000 -n | grep 
-E '^(latency|tps)' ; done
latency average = 0.052 ms
tps = 154326.941626 (without initial connection time)
latency average = 0.049 ms
tps = 162334.368215 (without initial connection time)
latency average = 0.050 ms
tps = 160703.883008 (without initial connection time)
latency average = 0.048 ms
tps = 165296.086615 (without initial connection time)
latency average = 0.049 ms
tps = 163706.310878 (without initial connection time)

### 0001-allow-out-of-order-notifications.patch

% cat notify_common.sql
NOTIFY channel_common;

% for n in `seq 1 5` ; do PGOPTIONS='-c 
publish_notifications_out_of_order=true' pgbench -f notify_common.sql -c 8 -t 
2000 -n | grep -E '^(latency|tps)' ; done

latency average = 0.026 ms
tps = 310149.647205 (without initial connection time)
latency average = 0.021 ms
tps = 380427.029340 (without initial connection time)
latency average = 0.025 ms
tps = 320108.837005 (without initial connection time)
latency average = 0.024 ms
tps = 333500.083375 (without initial connection time)
latency average = 0.022 ms
tps = 357965.859006 (without initial connection time)

## LISTEN+NOTIFY ####

~73k TPS master (HEAD)
~83k TPS (0001-allow-out-of-order-notifications.patch)
=> ~14% improvement

### master (HEAD)

% cat listen_notify_common.sql
LISTEN channel_common;
NOTIFY channel_common;

% for n in `seq 1 5` ; do pgbench -f listen_notify_common.sql -c 8 -t 2000 -n | 
grep -E '^(latency|tps)' ; done
latency average = 0.112 ms
tps = 71677.201722 (without initial connection time)
latency average = 0.109 ms
tps = 73228.220325 (without initial connection time)
latency average = 0.109 ms
tps = 73310.423826 (without initial connection time)
latency average = 0.108 ms
tps = 73995.625009 (without initial connection time)
latency average = 0.113 ms
tps = 70970.431944 (without initial connection time)

### 0001-allow-out-of-order-notifications.patch

% for n in `seq 1 5` ; do PGOPTIONS='-c 
publish_notifications_out_of_order=true' pgbench -f listen_notify_common.sql -c 
8 -t 2000 -n | grep -E '^(latency|tps)' ; done
latency average = 0.098 ms
tps = 81620.992919 (without initial connection time)
latency average = 0.095 ms
tps = 84173.755675 (without initial connection time)
latency average = 0.096 ms
tps = 83634.329802 (without initial connection time)
latency average = 0.095 ms
tps = 84311.700356 (without initial connection time)
latency average = 0.096 ms
tps = 83340.278357 (without initial connection time)

For a normal PostgreSQL with the CPU and storage on the same physical machine,
I think the results above clearly demonstrate that the global exclusive lock
is at least not the bottleneck, which I strongly believe instead is the flood of
unnecessary kill(pid, SIGUSR1) syscalls.

If anyone with access to a cloud environment, with compute and storage
separated, like suggested by Rishu, it would be interesting to see
what benchmark results you get.

/Joel


Reply via email to