Re: [HACKERS] Nested Wait Events?

2016-12-12 Thread Robert Haas
On Mon, Dec 12, 2016 at 2:42 PM, Simon Riggs  wrote:
> There's too many "I"s in that para. I've not presented this as a
> defect, nor is there any reason to believe this post is aimed at you
> personally.

Well, actually, there is.  You said in your original post that
something was "not correct" and something else was "not handled".
That sounds like a description of a defect to me.  If that wasn't how
you meant it, fine.

> I'm letting Hackers know that I've come across two problems and I see
> more. I'm good with accepting reduced scope in return for performance,
> but we should be allowed to discuss what limitations that imposes
> without rancour.

I'm not mad.  I thought you were.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Nested Wait Events?

2016-12-12 Thread Simon Riggs
On 12 December 2016 at 18:05, Robert Haas  wrote:
> On Mon, Dec 12, 2016 at 12:16 PM, Simon Riggs  wrote:
>> On 12 December 2016 at 16:52, Robert Haas  wrote:
>>> On Mon, Dec 12, 2016 at 11:33 AM, Simon Riggs  wrote:
 Last week I noticed that the Wait Event/Locks system doesn't correctly
 describe waits for tuple locks because in some cases that happens in
 two stages.
>>>
>>> Well, I replied to that email to say that I didn't agree with your
>>> analysis.  I think if something happens in two stages, those wait
>>> events should be distinguished.  The whole point here is to get
>>> clarity on what the system is waiting for, and we lose that if we
>>> start trying to merge together things which are at the code level
>>> separate.
>>
>> Clarity is what we are both looking for then.
>
> Granted.
>
>> I know I am waiting for a tuple lock. You want information about all
>> the lower levels. I'm good with that as long as the lower information
>> is somehow recorded against the higher level task, which it wouldn't
>> be in either of the cases I mention, hence why I bring it up again.
>
> So, I think that this may be a case where I built an apple and you are
> complaining that it's not an orange.  I had very clearly in mind from
> the beginning of the wait event work that we were trying to expose
> low-level information about what the system was doing, and I advocated
> for this design as a way of doing that, I think, reasonably well.  The
> statement that you want information about what is going on at a higher
> level is fair, but IMHO it's NOT fair to present that as a defect in
> what's been committed.  It was never intended to do that, at least not
> by me, and I committed all of the relevant patches and had a fair
> amount of involvement with the design.  You may think I should have
> been trying to solve a different problem and you may even be right,
> but that is a separate issue from how well I did at solving the
> problem I was attempting to solve.

There's too many "I"s in that para. I've not presented this as a
defect, nor is there any reason to believe this post is aimed at you
personally.

I'm letting Hackers know that I've come across two problems and I see
more. I'm good with accepting reduced scope in return for performance,
but we should be allowed to discuss what limitations that imposes
without rancour.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Nested Wait Events?

2016-12-12 Thread Robert Haas
On Mon, Dec 12, 2016 at 12:16 PM, Simon Riggs  wrote:
> On 12 December 2016 at 16:52, Robert Haas  wrote:
>> On Mon, Dec 12, 2016 at 11:33 AM, Simon Riggs  wrote:
>>> Last week I noticed that the Wait Event/Locks system doesn't correctly
>>> describe waits for tuple locks because in some cases that happens in
>>> two stages.
>>
>> Well, I replied to that email to say that I didn't agree with your
>> analysis.  I think if something happens in two stages, those wait
>> events should be distinguished.  The whole point here is to get
>> clarity on what the system is waiting for, and we lose that if we
>> start trying to merge together things which are at the code level
>> separate.
>
> Clarity is what we are both looking for then.

Granted.

> I know I am waiting for a tuple lock. You want information about all
> the lower levels. I'm good with that as long as the lower information
> is somehow recorded against the higher level task, which it wouldn't
> be in either of the cases I mention, hence why I bring it up again.

So, I think that this may be a case where I built an apple and you are
complaining that it's not an orange.  I had very clearly in mind from
the beginning of the wait event work that we were trying to expose
low-level information about what the system was doing, and I advocated
for this design as a way of doing that, I think, reasonably well.  The
statement that you want information about what is going on at a higher
level is fair, but IMHO it's NOT fair to present that as a defect in
what's been committed.  It was never intended to do that, at least not
by me, and I committed all of the relevant patches and had a fair
amount of involvement with the design.  You may think I should have
been trying to solve a different problem and you may even be right,
but that is a separate issue from how well I did at solving the
problem I was attempting to solve.

There was quite a lot of discussion 9-12 months ago (IIRC) about
wanting additional detail to be associated with wait events.  From
what I understand, Oracle will not only report that it waited for a
block to be read but also tells you for which block it was waiting,
and some of the folks at Postgres Pro were advocating for the wait
event facility to do something similar.  I strongly resisted that kind
of additional detail, because what makes the current system fast and
low-impact, and therefore able to be on by default, is that all it
does is one unsynchronized 4-byte write into shared memory.  If we do
anything more than that -- say 8 bytes, let alone the extra 20 bytes
we'd need to store a relfilenode -- we're going to need to insert
memory barriers in the path that updates the data in order to make
sure that it can be read without tearing, and I'm afraid that's going
to have a noticeable performance impact.  Certainly, we'd need to
check into that very carefully before doing it.  Operations like
reading a block or blocking on an LWLock are heavier than a couple of
memory barriers, but they're not necessarily so much heavier that we
can afford to throw extra memory barriers in those paths without any
impact.

Now, some of what you want to do here may be able to be done without
making wait_event_info any wider than uint32, and to the extent that's
possible without too much contortion I am fine with it.  If you want
to know that a tuple lock was being sought for an update rather than a
delete, that could probably be exposed.  But if you want to know WHICH
tuple or even WHICH relation was affected, this mechanism isn't
well-suited to that task.  I think we may well want to add some new
mechanism that reports those sorts of things, but THIS mechanism
doesn't have the bit-space for it and isn't designed to do it.  It's
designed to give basic information and be so cheap that we can use it
practically everywhere.  For more detailed reporting, we should
probably have facilities that are not turned on by default, or else
facilities that are limited to cases where the volume can never be
very high.  You don't have to add a lot of overhead to cause a problem
in a code path that executes tens of thousands of times per second per
backend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Nested Wait Events?

2016-12-12 Thread Simon Riggs
On 12 December 2016 at 16:52, Robert Haas  wrote:
> On Mon, Dec 12, 2016 at 11:33 AM, Simon Riggs  wrote:
>> Last week I noticed that the Wait Event/Locks system doesn't correctly
>> describe waits for tuple locks because in some cases that happens in
>> two stages.
>
> Well, I replied to that email to say that I didn't agree with your
> analysis.  I think if something happens in two stages, those wait
> events should be distinguished.  The whole point here is to get
> clarity on what the system is waiting for, and we lose that if we
> start trying to merge together things which are at the code level
> separate.

Clarity is what we are both looking for then.

I know I am waiting for a tuple lock. You want information about all
the lower levels. I'm good with that as long as the lower information
is somehow recorded against the higher level task, which it wouldn't
be in either of the cases I mention, hence why I bring it up again.

Same thing occurs in any case where we wait for multiple lwlocks.

"I had to buy a mop so I could clean the toilets" is potentially
important information, but I would prefer to start at the intention
side. So that "cleaning the toilets" shows up as the intent, which
might consist of multiple sub-tasks. We can then investigate why
sometimes cleaning the toilet takes one flush and other times it
involves a shopping trip to get a mop. If "mop purchase" is not
correctly associated with cleaning then we don't notice what is going
on and cannot do anything useful with the info.

Regrettably, it's an accounting problem not a database problem and we
need a chart of accounts hierarchy to solve it. (e.g. bill of
materials).

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Nested Wait Events?

2016-12-12 Thread Robert Haas
On Mon, Dec 12, 2016 at 11:33 AM, Simon Riggs  wrote:
> Last week I noticed that the Wait Event/Locks system doesn't correctly
> describe waits for tuple locks because in some cases that happens in
> two stages.

Well, I replied to that email to say that I didn't agree with your
analysis.  I think if something happens in two stages, those wait
events should be distinguished.  The whole point here is to get
clarity on what the system is waiting for, and we lose that if we
start trying to merge together things which are at the code level
separate.

> Now I notice that the Wait Event system doesn't handle waiting for
> recovery conflicts at all, though it does access ProcArrayLock
> multiple times.

This isn't a very clear statement.  Every place in the system that can
provoke a wait on a latch or a process semaphore display some kind of
wait event in pg_stat_activity.  Some of those displays may not be as
clear or detailed as you would like and that's fine, but saying they
are not handled is not exactly true.

> Don't have a concrete proposal, but I think we need a more complex
> model for how we record wait event data. Something that separates
> intention (e.g. "Travelling to St.Ives") from current event (e.g.
> "Waiting for LWLock")

That's not a bad thought.  We need to be careful to keep this very
lightweight so that it doesn't affect performance, but the general
concept of separating intention from current event might have some
legs.  We just need to be careful that it doesn't involve into
something that involves a lot of complicated bookkeeping, because
these wait events can occur very frequently and in hot code-paths.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Nested Wait Events?

2016-12-12 Thread Simon Riggs
Last week I noticed that the Wait Event/Locks system doesn't correctly
describe waits for tuple locks because in some cases that happens in
two stages.

Now I notice that the Wait Event system doesn't handle waiting for
recovery conflicts at all, though it does access ProcArrayLock
multiple times.

In both cases I tried to fix the problem before mentioning it here.

We can't add waits for either of those in a simple way because the
current system doesn't allow us to report multiple levels of wait. In
both these cases there is a single "top level wait" i.e. tuple locking
or recovery conflicts, even if there are other waits that form part of
the total wait.

I'm guessing that there are other situations like this also.

Don't have a concrete proposal, but I think we need a more complex
model for how we record wait event data. Something that separates
intention (e.g. "Travelling to St.Ives") from current event (e.g.
"Waiting for LWLock")

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers