Hi,

>
> c=1 && \
>   psql -c checkpoint -c 'select pg_switch_wal()' && \
>   pgbench -n -M prepared -c$c -j$c -f <(echo "SELECT 
> pg_logical_emit_message(true, 'test', repeat('0', 8192));";) -P1 -t 10000
>
> wal_init_zero = 1: 885 TPS
> wal_init_zero = 0: 286 TPS.

Your theory looks clear and the result is promsing. I can reproduce the
similar result in my setup.

on: tps = 1588.538378 (without initial connection time)
off: tps = 857.755343 (without initial connection time)  

> Of course I chose this case to be intentionally extreme - each transaction
> fills a bit more than one page of WAL and immediately flushes it. That
> guarantees that each commit needs a seperate filesystem metadata flush and a
> flush of the data for the fdatasync() at commit.

However if I increase the clients from 1 to 64(this may break this
extrme because of group commit) then we can see the wal_init_zero caused
noticable regression.  

c=64 && \
   psql -c checkpoint -c 'select pg_switch_wal()' && \
   pgbench -n -M prepared -c$c -j$c -f <(echo "SELECT 
pg_logical_emit_message(true, 'test', repeat('0', 8192));";) -P1 -t 10000

off:
tps = 12135.110730 (without initial connection time)
tps = 11964.016277 (without initial connection time)
tps = 12078.458724 (without initial connection time)

on:
tps = 9392.374563 (without initial connection time)
tps = 9391.916410 (without initial connection time)
tps = 9390.503777 (without initial connection time)

Now the wal_init_zero happens on the user backend and other backends also
need to wait for it, this looks not good to me. I find walwriter doesn't
do much things, I'd like to have a try if we can offload wal_init_zero
to the walwriter. 

About the wal_recycle, IIUC, it can only recycle a wal file during
Checkpoint, but checkpoint doesn't happens often.

-- 
Best Regards
Andy Fan



Reply via email to