On Thu, 2007-06-14 at 16:39 +0900, ITAGAKI Takahiro wrote: > Greg Smith <[EMAIL PROTECTED]> wrote: > > > On Mon, 11 Jun 2007, ITAGAKI Takahiro wrote: > > > If the kernel can treat sequential writes better than random writes, is > > > it worth sorting dirty buffers in block order per file at the start of > > > checkpoints? > > I wrote and tested the attached sorted-writes patch base on Heikki's > ldc-justwrites-1.patch. There was obvious performance win on OLTP workload. > > tests | pgbench | DBT-2 response time (avg/90%/max) > ---------------------------+---------+----------------------------------- > LDC only | 181 tps | 1.12 / 4.38 / 12.13 s > + BM_CHECKPOINT_NEEDED(*) | 187 tps | 0.83 / 2.68 / 9.26 s > + Sorted writes | 224 tps | 0.36 / 0.80 / 8.11 s > > (*) Don't write buffers that were dirtied after starting the checkpoint. > > machine : 2GB-ram, SCSI*4 RAID-5 > pgbench : -s400 -t40000 -c10 (about 5GB of database) > DBT-2 : 60WH (about 6GB of database)
I'm very surprised by the BM_CHECKPOINT_NEEDED results. What percentage of writes has been saved by doing that? We would expect a small percentage of blocks only and so that shouldn't make a significant difference. I thought we discussed this before, about a year ago. It would be easy to get that wrong and to avoid writing a block that had been re-dirtied after the start of checkpoint, but was already dirty beforehand. How long was the write phase of the checkpoint, how long between checkpoints? I can see the sorted writes having an effect because the OS may not receive blocks within a sufficient time window to fully optimise them. That effect would grow with increasing sizes of shared_buffers and decrease with size of controller cache. How big was the shared buffers setting? What OS scheduler are you using? The effect would be greatest when using Deadline. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster