Hi hackers,
I raised this topic a while back [1] but didn't get much traction, so
I went ahead and implemented it: a doublewrite buffer (DWB) mechanism
for PostgreSQL as an alternative to full_page_writes.
The core argument is straightforward. FPW and checkpoint frequency are
fundamentally at odds:
- FPW wants fewer checkpoints -- each checkpoint triggers a wave of
full-page WAL writes for every page dirtied for the first time,
bloating WAL and tanking write throughput.
- Fast crash recovery wants more checkpoints -- less WAL to replay
means the database comes back sooner.
DWB resolves this tension by moving torn page protection out of the
WAL path entirely. Instead of writing full pages into WAL (foreground,
latency-sensitive), dirty pages are sequentially written to a
dedicated doublewrite buffer area on disk before being flushed to
their actual locations. The buffer is fsync'd once when full, then
pages are scatter-written to their final positions. On crash recovery,
intact copies from the DWB repair any torn pages.
Key design differences:
- FPW: 1 WAL write (foreground) + 1 page write = directly impacts SQL latency
- DWB: 2 page writes (background flush path) = minimal user-visible impact
- DWB batches fsync() across multiple pages; WAL fsync batching is
limited by foreground latency constraints
- DWB decouples torn page protection from checkpoint frequency, so you
can checkpoint as often as you want without write amplification
I ran sysbench benchmarks (io-bound, --tables=10
--table_size=10000000) with checkpoint_timeout=30s,
shared_buffers=4GB, synchronous_commit=on. Each scenario uses a fresh
database, VACUUM FULL, 60s warmup, 300s run.
Results (TPS):
FPW OFF FPW ON DWB ON
read_write/32 18,038 7,943 13,009
read_write/64 24,249 9,533 15,387
read_write/128 27,801 9,715 15,387
write_only/32 53,146 18,116 31,460
write_only/64 57,628 19,589 32,875
write_only/128 59,454 14,857 33,814
Avg latency (ms):
FPW OFF FPW ON DWB ON
read_write/32 1.77 4.03 2.46
read_write/64 2.64 6.71 4.16
read_write/128 4.60 13.17 9.81
write_only/32 0.60 1.77 1.02
write_only/64 1.11 3.27 1.95
write_only/128 2.15 8.61 3.78
FPW ON drops to ~25% of baseline (FPW OFF). DWB ON holds at ~57%. In
write-heavy scenarios DWB delivers over 2x the throughput of FPW with
significantly better latency.
The implementation is here: https://github.com/baotiao/postgres
I'd appreciate any feedback on the approach. Would be great if the
community could take a look and see if this direction is worth
pursuing upstream.
Thanks,
Baotiao
[1]
https://www.postgresql.org/message-id/CAGbZs7hbJeUe7xY4QD25QW6VSnNFk1e3cwbCa8_R%2B2%2BYnoYRKw%40mail.gmail.com