On Wed, Sep 16, 2015 at 12:29 PM, Alexander Korotkov <aekorot...@gmail.com> wrote: > Yes, the major question is cost. But I think we should validate our thoughts > by experiments assuming there are more possible synchronization protocols. > Ildus posted implemention of double buffering approach that showed quite low > cost.
I'm not sure exactly which email you are referring to, but I don't believe that anyone has done experiments that are anywhere near comprehensive enough to convince ourselves that this won't be a problem. If a particular benchmark doesn't show an issue, that can just mean that the benchmark isn't hitting the case where there is a problem. For example, EDB has had customers who have severe contention apparently on the buffer content lwlocks, resulting in big slowdowns. You don't see that in, say, a pgbench run. But for people who have certain kinds of queries, it's really bad. Those sort of loads, where the lwlock system really gets stressed, are cases where adding overhead seems likely to pinch. > Yes, but some competing products also provides comprehensive waits > monitoring too. That makes me think it should be possible for us too. I agree, but keep in mind that some of those products may use techniques to reduce the overhead that we don't have available. I have a strong suspicion that one of those products in particular has done something clever to make measuring the time cheap on all platforms. Whatever that clever thing is, we haven't done it. So that matters. > I think the reason for hooks could be not only disagreements about design, > but platform dependent issues too. > Next step after we have view with current wait events will be gathering some > statistics of them. We can oppose at least two approaches here: > 1) Periodical sampling of current wait events. > 2) Measure each wait event duration. We could collect statistics for short > period locally and update shared memory structure periodically (using some > synchronization protocol). > > In the previous attempt to gather lwlocks statistics, you predict that > sampling could have a significant overhead . In contrast, on many systems > time measurements are cheap. We have implemented both approaches and it > shows that sampling every 1 milliseconds produce higher overhead than > individual duration measurements for each wait event. We can share another > version of waits monitoring based on sampling to make these results > reproducible for everybody. However, cheap time measurements are available > not for each platform. For instance, ISTM that on Windows time measurements > are too expensive . > > That makes me think that we need pluggable solution, at least for > statistics: direct measuring of events durations for majority of systems and > sampling for others as the least harm. To me, those seem like arguments for making it configurable, but not necessarily for having hooks. >> I think it's reasonable to consider reporting this data in the PGPROC >> using a 4-byte integer rather than reporting it through a singe byte >> in the backend status structure. I believe that addresses the >> concerns about reporting from auxiliary processes, and it also allows >> a little more data to be reported. For anything in excess of that, I >> think we should think rather harder. Most likely, such addition >> detail should be reported only for certain types of wait events, or on >> a delay, or something like that, so that the core mechanism remains >> really, really fast. > > That sounds reasonable. There are many pending questions, but it seems like > step forward to me. Great, let's do it. I think we should probably do the work to separate the non-individual lwlocks into tranches first, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers