Well, no, I'm not that paranoid where I expect checkpoints to be taking place so often. I do find it interesting that controlling checkpoint frequency is a factor of the number of WAL files available. In order to get up to 60 second checkpoints I had to set checkpoint_segments to 25, which resulted in the creation of 52 log files in data/pg_xlog. So for 120 second checkpoints I will need to have up to 104 log files in data/pg_xlog available? That's nearly 2 GB of space for WAL logs, and it only nets on average a 2 minute checkpoint. That would appear to be the pattern (2 * checkpoint_segments at the busiest time):
# - Checkpoints - #3 - 8 seconds - 3 log files #6 - 16 seconds - 13 log files #9 - 29 seconds - 19 log files #20 - 57 seconds - 41 log files #25 - > 60 sec - 52 log files checkpoint_segments = 25 # in logfile segments, min 1, 16MB each checkpoint_timeout = 300 # range 30-3600, in seconds #checkpoint_warning = 60 # in seconds, 0 is off So, if I do set the checkpoint_segments parameter to 50, 75, or even 100 I will have a considerable number of WAL files in data/pg_xlog, but they will be used more efficiently, so I will be archiving less? That's what I understand from the comments received so far. On PITR it makes sense, given the nature of the WAL file, that only full pages can be used. So I won't be able to set full_page_writes to false in this case. -Keaton On Wed, 2007-05-16 at 11:35 -0500, Jim C. Nasby wrote: > On Wed, May 16, 2007 at 10:28:29AM -0600, Keaton Adams wrote: > > On most database systems I am used to, there is a physical log and > > separate logical log. The physical log holds any ???before images??? of data > > pages that have been modified since the last checkpoint. The logical log > > holds the individual rows of data that have been inserted, updated, > > deleted as well as checkpoint records and the like. If the server > > crashes, fast recovery is initiated where (1) any pages in the physical > > log are put back in the data files (tablespaces/dbspaces) on disk to get > > back to a state of physical consistency and (2) individual transactions > > in the logical log since the last checkpoint are rolled forward / rolled > > back to get to a point of logical consistency. > > PostgreSQL combines the two, though there has been recent discussion > about changing that. There may be some improvements in this regard in > 8.3 (I don't remember if the patches were accepted or not). > > > Even with full_page_writes set to false and checkpoints taking place > > every 60 seconds or so, the amount of WAL log data generated per minute > > seems to be significant. > > Wow, do you really want to be checkpointing every 60 seconds? That's > going to greatly increase your WAL volume, as well as the size of WAL. > > > So my question is this: If I kick off a transaction that loads records > > with a size of 100 bytes, does the insert for that record take 100 bytes > > in the WAL file, or is the data archived in the WAL log in page size > > (8k) portions? So with piggyback commits if I can only stack up 2K worth > > of data before the next LogFlush, will 2K be written to the WAL file, or > > will 8K be written each time regardless of the amount of actual > > transaction data that is available to flush? > > (Generally) WAL only records differences. The exception is that the > first time a page is modified after a checkpoint, the entire page is > written out to WAL. > > > Since there is no separate physical log to keep track of dirty/modified > > pages since the last checkpoint I would assume that the WAL log is > > serving a dual purpose of being able to get back to the point of > > physical and logical database consistency, but I need to know for > > certain that there is not a way to reduce the amount of WAL data being > > written for the amount of transaction information we are actually > > writing to the database at any given point in time. > > The only way to do it right now is to reduce the frequency of your > checkpoints. IIRC you can't actually disable full page writes if you're > using PITR.
