On Fri, Jun 8, 2012 at 5:02 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > On 8 June 2012 09:14, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> > wrote: > >> The requirement for this patch is as follows. >> >> - What I want to get is similarity of the behaviors between >> master and (hot-)standby concerning checkpoint >> progression. Specifically, checkpoints for streaming >> replication running at the speed governed with >> checkpoint_segments. The work of this patch is avoiding to get >> unexpectedly large number of WAL segments stay on standby >> side. (Plus, increasing the chance to skip recovery-end >> checkpoint by my another patch.) > > Since we want wal_keep_segments number of WAL files on master (and > because of cascading, on standby also), I don't see any purpose to > triggering more frequent checkpoints just so we can hit a magic number > that is most often set wrong.
This is a good point. Right now, if you set checkpoint_segments to a large value, we retain lots of old WAL segments even when the system is idle (cf. XLOGfileslop). I think we could be smarter about that. I'm not sure what the exact algorithm should be, but right now users are forced between setting checkpoint_segments very large to achieve optimum write performance and setting it small to conserve disk space. What would be much better, IMHO, is if the number of retained segments could ratchet down when the system is idle, eventually reaching a state where we keep only one segment beyond the one currently in use. For example, suppose I have checkpoint_timeout=10min and checkpoint_segments=300. If, five minutes into the ten-minute checkpoint interval, I've only used 10 WAL segments, then I probably am not going to need another 290 of them in the remaining five minutes. We ought to keep, say, 20 in that case (number we expect to need * 2, similar to bgwriter_lru_multiplier) and delete the rest. If we did that, people could set checkpoint_segments much higher to handle periods of peak load without continuously consuming large amounts of space with old, useless WAL segments. It doesn't end up working very well anyway because the old WAL segments are no longer in cache by the time we go to overwrite them. > ISTM that we should avoid triggering a checkpoint on the master if > checkpoint_segments is less than wal_keep_segments. Such checkpoints > serve no purpose because we don't actually limit and recycle the WAL > files and all it does is slow people down. On the other hand, I emphatically disagree with this, for the same reasons as on the other thread. Getting data down to disk provides a greater measure of safety than having it in memory. Making checkpoint_segments not force a checkpoint is no better than making checkpoint_timeout not force a checkpoint. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers