On Fri, Aug 8, 2014 at 4:08 PM, Guillaume Lelarge <guilla...@lelarge.info> wrote: > Hi, > > As part of our monitoring work for our customers, we stumbled upon an issue > with our customers' servers who have a wal_keep_segments setting higher than > 0. > > We have a monitoring script that checks the number of WAL files in the > pg_xlog directory, according to the setting of three parameters > (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments). > We usually add a percentage to the usual formula: > > greatest( > (2 + checkpoint_completion_target) * checkpoint_segments + 1, > checkpoint_segments + wal_keep_segments + 1 > ) > > And we have lots of alerts from the script for customers who set their > wal_keep_segments setting higher than 0. > > So we started to question this sentence of the documentation: > > There will always be at least one WAL segment file, and will normally not be > more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 or > checkpoint_segments + wal_keep_segments + 1 files. > > (http://www.postgresql.org/docs/9.3/static/wal-configuration.html) > > While doing some tests, it appears it would be more something like: > > wal_keep_segments + (2 + checkpoint_completion_target) * checkpoint_segments > + 1 > > But after reading the source code (src/backend/access/transam/xlog.c), the > right formula seems to be: > > wal_keep_segments + 2 * checkpoint_segments + 1 > > Here is how we went to this formula... > > CreateCheckPoint(..) is responsible, among other things, for deleting and > recycling old WAL files. From src/backend/access/transam/xlog.c, master > branch, line 8363: > > /* > * Delete old log files (those no longer needed even for previous > * checkpoint or the standbys in XLOG streaming). > */ > if (_logSegNo) > { > KeepLogSeg(recptr, &_logSegNo); > _logSegNo--; > RemoveOldXlogFiles(_logSegNo, recptr); > } > > KeepLogSeg(...) function takes care of wal_keep_segments. From > src/backend/access/transam/xlog.c, master branch, line 8792: > > /* compute limit for wal_keep_segments first */ > if (wal_keep_segments > 0) > { > /* avoid underflow, don't go below 1 */ > if (segno <= wal_keep_segments) > segno = 1; > else > segno = segno - wal_keep_segments; > } > > IOW, the segment number (segno) is decremented according to the setting of > wal_keep_segments. segno is then sent back to CreateCheckPoint(...) via > _logSegNo. The RemoveOldXlogFiles() gets this segment number so that it can > remove or recycle all files before this segment number. This function gets > the number of WAL files to recycle with the XLOGfileslop constant, which is > defined as: > > /* > * XLOGfileslop is the maximum number of preallocated future XLOG segments. > * When we are done with an old XLOG segment file, we will recycle it as a > * future XLOG segment as long as there aren't already XLOGfileslop future > * segments; else we'll delete it. This could be made a separate GUC > * variable, but at present I think it's sufficient to hardwire it as > * 2*CheckPointSegments+1. Under normal conditions, a checkpoint will free > * no more than 2*CheckPointSegments log segments, and we want to recycle > all > * of them; the +1 allows boundary cases to happen without wasting a > * delete/create-segment cycle. > */ > #define XLOGfileslop (2*CheckPointSegments + 1) > > (in src/backend/access/transam/xlog.c, master branch, line 100) > > IOW, PostgreSQL will keep wal_keep_segments WAL files before the current WAL > file, and then there may be 2*CheckPointSegments + 1 recycled ones. Hence > the formula: > > wal_keep_segments + 2 * checkpoint_segments + 1 > > And this is what we usually find in our customers' servers. We may find more > WAL files, depending on the write activity of the cluster, but in average, > we get this number of WAL files. > > AFAICT, the documentation is wrong about the usual number of WAL files in > the pg_xlog directory. But I may be wrong, in which case, the documentation > isn't clear enough for me, and should be fixed so that others can't > misinterpret it like I may have done. > > Any comments? did I miss something, or should we fix the documentation?
I think you're right. The correct formula of the number of WAL files in pg_xlog seems to be (3 + checkpoint_completion_target) * checkpoint_segments + 1 or wal_keep_segments + 2 * checkpoint_segments + 1 Why? At the end of checkpoint, the WAL files which were generated since the start of previous checkpoint cannot be removed and must remain in pg_xlog. The number of them is (1 + checkpoint_completion_target) * checkpoint_segments or wal_keep_segments Also, at the end of checkpoint, as you pointed out, if there are *many* enough old WAL files, 2 * checkpoint_segments + 1 WAL files will be recycled. Then checkpoint_segments WAL files will be consumed till the end of next checkpoint. But since there are already 2 * checkpoint_segments + 1 recycled WAL files, no more files are increased. So, WAL files that we cannot remove and can recycle at the end of checkpoint can exist in pg_xlog, and the num of them can be calculated by the above formula. If my understanding is right, we need to change the formula at the document. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers