On Wed, Mar 30, 2016 at 2:22 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Wed, Mar 30, 2016 at 2:36 PM, Robert Haas <robertmh...@gmail.com> wrote: >> OK, I committed this, with a few tweaks. In particular, I added a >> flag variable instead of relying on "latch set" == "need to send >> reply"; the other changes were cosmetic. >> >> I'm not sure how much more of this we can realistically get into 9.6; >> the latter patches haven't had much review yet. But I'll set this >> back to Needs Review in the CommitFest and we'll see where we end up. >> But even if we don't get anything more than this, it's still rather >> nice: remote_apply turns out to be only slightly slower than remote >> flush, and it's a guarantee that a lot of people are looking for. > > Thank you Michael and Robert! > > Please find attached the rest of the patch series, rebased against > master. The goal of the 0002 patch is to provide an accurate > indication of the current replay lag on each standby, visible to users > like this: > > postgres=# select application_name, replay_lag from pg_stat_replication; > application_name │ replay_lag > ──────────────────┼───────────────── > replica1 │ 00:00:00.000299 > replica2 │ 00:00:00.000323 > replica3 │ 00:00:00.000319 > replica4 │ 00:00:00.000303 > (4 rows) > > It works by maintaining a buffer of (end of WAL, time now) samples > received from the primary, and then eventually feeding those times > back to the primary when the recovery process replays the > corresponding locations. > > Compared to approaches based on commit timestamps, this approach has > the advantage of providing non-misleading information between commits. > For example, if you run a batch load job that takes 1 minute to insert > the whole phonebook and no other transactions run, you will see > replay_lag updating regularly throughout that minute, whereas typical > commit timestamp-only approaches will show an increasing lag time > until a commit record is eventually applied. Compared to simple LSN > location comparisons, it reports in time rather than bytes of WAL, > which can be more meaningful for DBAs. > > When the standby is entirely caught up and there is no write activity, > the reported time effectively represents the ping time between the > servers, and is updated every wal_sender_timeout / 2, when keepalive > messages are sent. While new WAL traffic is arriving, the walreceiver > records timestamps at most once per second in a circular buffer, and > then sends back replies containing the recorded timestamps as fast as > the recovery process can apply the corresponding xlog. The lag number > you see is computed by the primary server comparing two timestamps > generated by its own system clock, one of which has been on a journey > to the standby and back. > > Accurate lag estimates are a prerequisite for the 0004 patch (about > which more later), but I believe users would find this valuable as a > feature on its own.
Well, one problem with this is that you can't put a loop inside of a spinlock-protected critical section. In general, I think this is a pretty reasonable way of attacking this problem, but I'd say it's significantly under-commented. Where should someone go to get a general overview of this mechanism? The answer is not "at place XXX within the patch". (I think it might merit some more extensive documentation, too, although I'm not exactly sure what that should look like.) When you overflow the buffer, you could thin in out in a smarter way, like by throwing away every other entry instead of the oldest one. I guess you'd need to be careful how you coded that, though, because replaying an entry with a timestamp invalidates some of the saved entries without formally throwing them out. Conceivably, 0002 could be split into two patches, one of which computes "stupid replay lag" considering only records that naturally carry timestamps, and a second adding the circular buffer to handle the case where much time passes without finding such a record. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers