Here is a patch for TODO item, "Consider sorting writes during checkpoint".
It writes dirty buffers in the order of block number during checkpoint
so that buffers are written sequentially.

I proposed the patch before, but it was rejected because 8.3 feature
has been frozen already at that time.

I rewrote it to be applied cleanly against current HEAD, but the concept
is not changed at all -- Memorizing pairs of (buf_id, BufferTag) for each
dirty buffer into an palloc-ed array at the start of checkpoint. Sorting
the array in BufferTag order and writing buffers in the order.

There are 10% of performance win in pgbench on my machine with RAID-0
disks. There can be more benefits on RAID-5 disks, because random writes
are slower than sequential writes there.

  tps = 1134.233955 (excluding connections establishing)
[HEAD with patch]
  tps = 1267.446249 (excluding connections establishing)

transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 32
number of transactions per client: 100000
number of transactions actually processed: 3200000/3200000

2x Quad core Xeon, 16GB RAM, 4x HDD (RAID-0)

shared_buffers = 2GB
wal_buffers = 4MB
checkpoint_segments = 64
checkpoint_timeout = 5min
checkpoint_completion_target = 0.5

ITAGAKI Takahiro
NTT Open Source Software Center

Attachment: sorted-ckpt-84.patch
Description: Binary data

Sent via pgsql-patches mailing list (
To make changes to your subscription:

Reply via email to