Hi,
> > c=1 && \ > psql -c checkpoint -c 'select pg_switch_wal()' && \ > pgbench -n -M prepared -c$c -j$c -f <(echo "SELECT > pg_logical_emit_message(true, 'test', repeat('0', 8192));";) -P1 -t 10000 > > wal_init_zero = 1: 885 TPS > wal_init_zero = 0: 286 TPS. Your theory looks clear and the result is promsing. I can reproduce the similar result in my setup. on: tps = 1588.538378 (without initial connection time) off: tps = 857.755343 (without initial connection time) > Of course I chose this case to be intentionally extreme - each transaction > fills a bit more than one page of WAL and immediately flushes it. That > guarantees that each commit needs a seperate filesystem metadata flush and a > flush of the data for the fdatasync() at commit. However if I increase the clients from 1 to 64(this may break this extrme because of group commit) then we can see the wal_init_zero caused noticable regression. c=64 && \ psql -c checkpoint -c 'select pg_switch_wal()' && \ pgbench -n -M prepared -c$c -j$c -f <(echo "SELECT pg_logical_emit_message(true, 'test', repeat('0', 8192));";) -P1 -t 10000 off: tps = 12135.110730 (without initial connection time) tps = 11964.016277 (without initial connection time) tps = 12078.458724 (without initial connection time) on: tps = 9392.374563 (without initial connection time) tps = 9391.916410 (without initial connection time) tps = 9390.503777 (without initial connection time) Now the wal_init_zero happens on the user backend and other backends also need to wait for it, this looks not good to me. I find walwriter doesn't do much things, I'd like to have a try if we can offload wal_init_zero to the walwriter. About the wal_recycle, IIUC, it can only recycle a wal file during Checkpoint, but checkpoint doesn't happens often. -- Best Regards Andy Fan