On Mon, Dec 12, 2016 at 12:16 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On 12 December 2016 at 16:52, Robert Haas <robertmh...@gmail.com> wrote:
>> On Mon, Dec 12, 2016 at 11:33 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
>>> Last week I noticed that the Wait Event/Locks system doesn't correctly
>>> describe waits for tuple locks because in some cases that happens in
>>> two stages.
>> Well, I replied to that email to say that I didn't agree with your
>> analysis.  I think if something happens in two stages, those wait
>> events should be distinguished.  The whole point here is to get
>> clarity on what the system is waiting for, and we lose that if we
>> start trying to merge together things which are at the code level
>> separate.
> Clarity is what we are both looking for then.


> I know I am waiting for a tuple lock. You want information about all
> the lower levels. I'm good with that as long as the lower information
> is somehow recorded against the higher level task, which it wouldn't
> be in either of the cases I mention, hence why I bring it up again.

So, I think that this may be a case where I built an apple and you are
complaining that it's not an orange.  I had very clearly in mind from
the beginning of the wait event work that we were trying to expose
low-level information about what the system was doing, and I advocated
for this design as a way of doing that, I think, reasonably well.  The
statement that you want information about what is going on at a higher
level is fair, but IMHO it's NOT fair to present that as a defect in
what's been committed.  It was never intended to do that, at least not
by me, and I committed all of the relevant patches and had a fair
amount of involvement with the design.  You may think I should have
been trying to solve a different problem and you may even be right,
but that is a separate issue from how well I did at solving the
problem I was attempting to solve.

There was quite a lot of discussion 9-12 months ago (IIRC) about
wanting additional detail to be associated with wait events.  From
what I understand, Oracle will not only report that it waited for a
block to be read but also tells you for which block it was waiting,
and some of the folks at Postgres Pro were advocating for the wait
event facility to do something similar.  I strongly resisted that kind
of additional detail, because what makes the current system fast and
low-impact, and therefore able to be on by default, is that all it
does is one unsynchronized 4-byte write into shared memory.  If we do
anything more than that -- say 8 bytes, let alone the extra 20 bytes
we'd need to store a relfilenode -- we're going to need to insert
memory barriers in the path that updates the data in order to make
sure that it can be read without tearing, and I'm afraid that's going
to have a noticeable performance impact.  Certainly, we'd need to
check into that very carefully before doing it.  Operations like
reading a block or blocking on an LWLock are heavier than a couple of
memory barriers, but they're not necessarily so much heavier that we
can afford to throw extra memory barriers in those paths without any

Now, some of what you want to do here may be able to be done without
making wait_event_info any wider than uint32, and to the extent that's
possible without too much contortion I am fine with it.  If you want
to know that a tuple lock was being sought for an update rather than a
delete, that could probably be exposed.  But if you want to know WHICH
tuple or even WHICH relation was affected, this mechanism isn't
well-suited to that task.  I think we may well want to add some new
mechanism that reports those sorts of things, but THIS mechanism
doesn't have the bit-space for it and isn't designed to do it.  It's
designed to give basic information and be so cheap that we can use it
practically everywhere.  For more detailed reporting, we should
probably have facilities that are not turned on by default, or else
facilities that are limited to cases where the volume can never be
very high.  You don't have to add a lot of overhead to cause a problem
in a code path that executes tens of thousands of times per second per

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to