On Sat, Oct 10, 2020 at 3:41 AM Peter J. Holzer <hjp-pg...@hjp.at> wrote:
> On 2020-10-05 11:29:04 -0600, Michael Lewis wrote: > > > > I suggest that in PG12 you can monitor the > > "lag" of a standby server more directly by looking at columns > > write_lag, > > flush_lag, replay_lag in the pg_stat_replication view. > > > > > > And are those things updated when there are no changes to the master > > database? > > Probably not, as there is nothing to replicate, so no new data it > provided. > > > > If you setup a scripted process to update a single row with a > timestamptz on > > the source/primary every minute, then you have a very simple consistent > change > > and also a way to check on the replica what is current time vs > > last_scripted_update_time if you will and know the approx lag. It would > seem > > like a simple albeit hacky solution to you wanting a file every X minutes > > regardless of server activity. > > It also has the advantage that you don't have to wait for the WAL file > to be written. You can just check whether the change appears on the > replicas. About 2 years ago I wrote a Nagios/Icinga check that does > that: Update a timestamp in a table on the master, then connect to all > the replicas and wait for the change to show up on them. It then reports > the lag for each replica and a final status (OK, WARNING, CRITICAL) > based on the maximal lag. > > I think I wrote it because the PostgreSQL version we were using at the > time didn't have the lag columns yet, but it does have the advantage of > providing an end to end check (do I really get the correct value?), not > the database's idea of whether replication is working. > > (The check is written in Go and buried in a svn repo at work, but I > could publish it if there is interest) > > hp > > -- > _ | Peter J. Holzer | Story must make more sense than reality. > |_|_) | | > | | | h...@hjp.at | -- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" > I would be interested in the Nagios/Icinga check you wrote.