On Fri, Aug 8, 2014 at 12:08 AM, Guillaume Lelarge <guilla...@lelarge.info>
wrote:

> Hi,
>
> As part of our monitoring work for our customers, we stumbled upon an
> issue with our customers' servers who have a wal_keep_segments setting
> higher than 0.
>
> We have a monitoring script that checks the number of WAL files in the
> pg_xlog directory, according to the setting of three parameters
> (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments).
> We usually add a percentage to the usual formula:
>
> greatest(
>   (2 + checkpoint_completion_target) * checkpoint_segments + 1,
>   checkpoint_segments + wal_keep_segments + 1
> )
>

I think the first bug is even having this formula in the documentation to
start with, and in trying to use it.

"and will normally not be more than..."

This may be "normal" for a toy system.  I think that the normal state for
any system worth monitoring is that it has had load spikes at some point in
the past.

So it is the next part of the doc, which describes how many segments it
climbs back down to upon recovering from a spike, which is the important
one.  And that doesn't mention wal_keep_segments at all, which surely
cannot be correct.

I will try to independently derive the correct formula from the code, as
you did, without looking too much at your derivation  first, and see if we
get the same answer.

Cheers,

Jeff

Reply via email to