On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao <masao.fu...@gmail.com> wrote: > I agree that the capability to measure the remote_apply lag is very useful. > Also I want to measure the remote_write and remote_flush lags, for example, > in order to diagnose the cause of replication lag.
Good idea. I will think about how to make that work. There was a proposal to make writing and flushing independent. I'd like that to go in. Then the write_lag and flush_lag could diverge significantly, and it would be nice to be able to see that effect as time (though you could already see it with LSN positions). > For that, what about maintaining the pairs of send-timestamp and LSN in > *sender side* instead of receiver side? That is, walsender adds the pairs > of send-timestamp and LSN into the buffer every sampling period. > Whenever walsender receives the write, flush and apply locations from > walreceiver, it calculates the write, flush and apply lags by comparing > the received and stored LSN and comparing the current timestamp and > stored send-timestamp. I thought about that too, but I couldn't figure out how to make the sampling work. If the primary is choosing (LSN, time) pairs to store in a buffer, and the standby is sending replies at times of its choosing (when wal_receiver_status_interval has been exceeded), then you can't accurately measure anything. You could fix that by making the standby send a reply *every time* it applies some WAL (like it does for transactions committing with synchronous_commit = remote_apply, though that is only for commit records), but then we'd be generating a lot of recovery->walreceiver communication and standby->primary network traffic, even for people who don't otherwise need it. It seems unacceptable. Or you could fix that by setting the XACT_COMPLETION_APPLY_FEEDBACK bit in the xl_xinfo.xinfo for selected transactions, as a way to ask the standby to send a reply when that commit record is applied, but that only works for commit records. One of my goals was to be able to report lag accurately even between commits (very large data load transactions etc). Or you could fix that by sending a list of 'interesting LSNs' to the standby, as a way to ask it to send a reply when those LSNs are applied. Then you'd need a circular buffer of (LSN, time) pairs in the primary AND a circular buffer of LSNs in the standby to remember which locations should generate a reply. This doesn't seem to be an improvement. That's why I thought that the standby should have the (LSN, time) buffer: it decides which samples to record in its buffer, using LSN and time provided by the sending server, and then it can send replies at exactly the right times. The LSNs don't have to be commit records, they're just arbitrary points in the WAL stream which we attach timestamps to. IPC and network overhead is minimised, and accuracy is maximised. > As a bonus of this approach, we don't need to add the field into the replay > message that walreceiver can very frequently send back. Which might be > helpful in terms of networking overhead. For the record, these replies are only sent approximately every replay_lag_sample_interval (with variation depending on replay speed) and are only 42 bytes with the new field added.  https://www.postgresql.org/message-id/CA%2BU5nMJifauXvVbx%3Dv3UbYbHO3Jw2rdT4haL6CCooEDM5%3D4ASQ%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers