On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:
> Idea of individual time measurement of every wait event met criticism
> because it might have high overhead [1].

Right. And that's actually one of the point which I meant with "didn't
listen to criticism". There've been a lot of examples, on an off list,
where taking timings trigger significant slowdowns.  Yes, in some
bare-metal environments, which a coherent tsc, the overhead can be
low. But that doesn't make it ok to have a high overhead on a lot of
other systems.

Just claiming that that's not a problem will only lead to your position
not being taken serious.

> This is really so at least for Windows [2].

Measuring timing overhead for a simplistic workload on a single system
doesn't mean that.  Try doing such a test on a vmware esx virtualized
windows machine, on a multi-socket server; in a lot of instances you'll
see two-three orders of magnitude longer average times; with peaks going
into 4-5 orders of magnitude.  And, as sad it is, realistically most
postgres instances will run in virtualized environments.

> But accessing only current values wouldn't be very useful.  We
> anyway need to gather some statistics.  Gathering it by sampling would be
> both more expensive and less accurate for majority of systems.  This is why
> I proposed hooks to make possible platform dependent extensions.  Robert
> rejects hook because he is "not a big fan of hooks as a way of resolving
> disagreements about the design" [3].

I think I agree with Robert here. Providing hooks into very low level
places tends to lead to problems in my experience; tight control over
what happens is often important - I certainly don't want any external
code to run while we're waiting for an lwlock.

> Besides that is actually not design issues but platform issues...

I don't see how that's the case.

> Another question is wait parameters.  We want to expose wait event with
> some parameters.  Robert rejects that because it *might* add additional
> overhead [3]. When I proposed to fit something useful into hard-won
> 4-bytes, Roberts claims that it is "too clever" [4].

I think stopping to treat this as "Robert/EDB vs. pgpro" would be a good
first step to make progress here.

It seems entirely possible to extend the current API in an incremental
fashion, either allowing to disable the individual pieces, or providing
sufficient measurements that it's not needed.

> So, situation looks like dead-end.  I have no idea how to convince Robert
> about any kind of advanced functionality of wait monitoring to PostgreSQL.
> I'm thinking about implementing sampling extension over current
> infrastructure just to make community see that it sucks. Andres, it would
> be very nice if you have any idea how to move this situation forward.

I've had my share of conflicts with Robert. But if I were in his shoes,
targeted by this kind of rhetoric, I'd be very tempted to just ignore
any further arguments from the origin.  So I think the way forward is
for everyone to cool off, and to see how we can incrementally make
progress from here on.

> Another aspect is that EnterpriseDB offers waits monitoring in proprietary
> fork [5].



Andres Freund

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to