On Sun, Oct 2, 2011 at 8:21 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > The problem is to find the replication delay, even when the system is quiet. > > What I have proposed finds the replication delay more accurately even > than looking at the last commit, since often there are writes but no > commits. > > If we focus on the problem, rather than the first suggested solution > to that problem, we'll come out on top.
Sorry, but I still don't really think it's fair to say that you've proposed a solution to this problem. Or if you have, neither I nor Fujii Masao understand that proposal well enough to decide whether we like it. You said "maybe we could WAL log something once per checkpoint cycle" or "maybe we could add a new protocol message". We've both replied with various emails saying "we don't understand how that would solve the problem". If you want to add some detail to your proposal, then we can weigh the pros and cons as compared with what the patch does - but right now all you've provided is a theory that there might be a better solution to this problem out there, not any details about how it would work. Or if you have, then please post a link to the message where those details are written out, because I cannot find them on the thread. I do, however, agree that that the case where the system is quiet is the problem case for computing replication delay. It seems to me that, even without this patch, if the system has a continuous stream of commits, you can easily find the replication delay by differencing the current time on the master with the value returned by pg_last_xact_replay_timestamp(). However, if the master goes quiet, then the slave will appear to be progressively farther behind. With the addition of this patch, that problem goes away: you can now difference the return value of pg_last_xact_insert_timestamp() on the master with the return value of pg_last_xact_replay_timestamp() on the slave. If the master goes quiet, then pg_last_xact_insert_timestamp() will stop advancing, and so the two values you are comparing will be equal once the slave has caught up, and remain equal until activity resumes on the master. Now, there is a more subtle remaining problem, which is that when activity *does* resume on the master, there will be a (probably quite short) window of time during which the slave will have a much earlier timestamp than the one on the master. When the master has a commit after a long idle period but the slave has not yet replayed the commit record, the replication delay will appear to be equal to the length of the idle period. But that doesn't seem like a sufficient reason to reject the whole approach, because there are several ways around it. First, you could simply decide that the large computed lag value, although counterintuitive, is accurate under some definition, because, well, that really is the lag between the last transaction committed on the master and the last transaction committed on the standby, and if you don't like the fact that timestamps behave that way, you should compare using WAL positions instead. If you don't like that approach, then a second, also viable approach is to teach your monitoring software that the replication delay can never increase faster than the rate at which clock time is passing. So if you were caught up a minute ago, then you can't be more than a minute behind now. Another point I want to make here is that there's probably more than one useful definition of replication delay. The previous question presupposes that you're trying to answer the question "if I have a transaction that committed N seconds ago on the master, will it be visible on the standby?". It's also a reasonable time-based substitute for measuring the difference in master and standby WAL positions, although certainly it's going to work better if the rate of WAL generation is relatively even. But for a lot of people, it may be that what they really want to know is "what is the expected time for the standby to replay all generated but not yet applied WAL?" - or maybe some third thing that I'm not thinking of - and this function won't provide that. I think we can ultimately afford to provide more than one mechanism here, so I don't see doing this as foreclosing any other also-useful calculation that someone may wish to add in the future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers