Re: [HACKERS] Next steps on pg_stat_statements normalisation

2012-01-23 Thread Peter Geoghegan
On 22 January 2012 05:30, Peter Geoghegan  wrote:
> The syntax for constants is sufficiently simple that I think that a
> good set of regression tests should make this entirely practicable,
> covering all permutations of relevant factors affecting how the
> implementation should act, including for example
> standard_conforming_strings. There's no reason to think that the SQL
> syntax rules for constants should need to change very frequently, or
> even at all, so we should be fine with just knowing the starting
> position. It's quicker and easier to do it this way than to argue the
> case for my original implementation, so that's what I'll do. Whatever
> overhead this independent, pg_stat_statements-internal const parsing
> may impose, it will at least only be paid once per query, when we
> first require a canonicalised representation of the query for the
> pg_stat_statements view.

I've decided that a better approach to this problem is to use the
low-level scanner API (declared in scanner.h) which is currently
exclusively used for plpgsql. This seems to work well, as I'm using
the authoritative scanner to scan constants. Obviously this does not
imply that everyone must pay any overhead, so this seems like the best
of both worlds.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Next steps on pg_stat_statements normalisation

2012-01-22 Thread Robert Haas
On Sun, Jan 22, 2012 at 5:58 PM, Simon Riggs  wrote:
> On Sun, Jan 22, 2012 at 5:30 AM, Peter Geoghegan  
> wrote:
>> So, having received feedback from Tom and others in relation to this
>> patch, I would like to state how I think I should go about addressing
>> various concerns to ensure that a revision of the patch gets into the
>> 9.2 release. As I've said time and again, I think that it is very
>> important that we have this, sooner rather than later.
>
> Nothing can be ensured completely, but I would add this is a very
> important feature. Without it, large systems without prepared
> statements are mostly untunable and therefore untuned, which is a bad
> thing.

I, too, am pretty excited about the potential for this feature.
Having not read the patch I'm not able to comment on code quality or
design at this point, but I'm definitely +1 on the concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Next steps on pg_stat_statements normalisation

2012-01-22 Thread Simon Riggs
On Sun, Jan 22, 2012 at 5:30 AM, Peter Geoghegan  wrote:

> So, having received feedback from Tom and others in relation to this
> patch, I would like to state how I think I should go about addressing
> various concerns to ensure that a revision of the patch gets into the
> 9.2 release. As I've said time and again, I think that it is very
> important that we have this, sooner rather than later.

Nothing can be ensured completely, but I would add this is a very
important feature. Without it, large systems without prepared
statements are mostly untunable and therefore untuned, which is a bad
thing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Next steps on pg_stat_statements normalisation

2012-01-21 Thread Peter Geoghegan
So, having received feedback from Tom and others in relation to this
patch, I would like to state how I think I should go about addressing
various concerns to ensure that a revision of the patch gets into the
9.2 release. As I've said time and again, I think that it is very
important that we have this, sooner rather than later.

The main objection made in relation to my patch was to the overhead of
the admittedly invasive (though purely mechanical) changes to the
grammar to record the length of all tokens, so that Const tokens could
subsequently have their length known in a principled fashion. While I
believe that you'd be extremely hard pushed to demonstrate any
regression at all from these extra cycles, it's also true that given
the starting position of tokens (a value already exposed for the
purposes of producing error messages outside the parser that reference
particular tokens with a caret to their position in the query string,
corresponding to various nodes) there's no reason to think that it
should not be possible to calculate the total length on-the-fly from
within pg_stat_statements.

The syntax for constants is sufficiently simple that I think that a
good set of regression tests should make this entirely practicable,
covering all permutations of relevant factors affecting how the
implementation should act, including for example
standard_conforming_strings. There's no reason to think that the SQL
syntax rules for constants should need to change very frequently, or
even at all, so we should be fine with just knowing the starting
position. It's quicker and easier to do it this way than to argue the
case for my original implementation, so that's what I'll do. Whatever
overhead this independent, pg_stat_statements-internal const parsing
may impose, it will at least only be paid once per query, when we
first require a canonicalised representation of the query for the
pg_stat_statements view.

I have, in the past, spoken against the idea of parsing query strings
on the fly (an objection proven to be well-founded by various third
party tools), but this is something quite different - by starting from
a position that is known to be where a Const token starts, we need
only concern ourselves with constant syntax, a much more limited and
stable thing to have to pin down.

I imagined that by plugging into Planner_hook, and hashing the
post-rewrite query tree immediately before it was passed to the
planner, that there was a certain amount of merit in recognising as
equivalent queries that would definitely result in the same plan, all
other things, and selectivity estimates of constants, being equal.
However, since some of that logic, such as the logic through which
different though equivalent join syntaxes are normalised, actually
occurs in the planner, it was judged that there was an undesirable
inconsistency. I also felt that it was useful that the hashing occur
on the post-rewrite "query proper", but various examples illustrated
that that might be problematic, such as the inability for the
implementation to differentiate constants in the original query string
from, for example, constants appearing in view definitions.

Tom wanted to hash plans. I think that this view was partially
informed by concerns about co-ordination of hooks. While I had worked
around those problems, the solution that I'd proposed wasn't very
pretty, and it wasn't that hard to imagine someone finding a reason to
need to break it. While I think there might be a lot to be said for
hashing plans, that isn't a natural adjunct to pg_stat_statements -
the user expects to see exactly one entry for what they consider to be
the same query, and I think you'd have a hard time convincing many
people that it's okay to have a large number of entries for what they
thought was the same query, just reflecting the fact that a different
plan was used for each of several entries with the same query string,
based on factors they didn't consider essential to the query that may
be quite transitory. Some more marginal plans for the same query could
be missing entirely from the fixed-size hash table. It wouldn't have
been a very useful way of normalising queries, even if it did more
accurately reflect where execution costs came from, so I am dead set
against doing this for normalisation. However, it might be that we can
add even more value in the 9.3 release by hashing plans as well, to do
entirely more interesting things, but we need to have a canonicalised,
normalised representation of queries first - that's obviously how
people expect a tool like this to work, and it's difficult to imagine
a practical work-flow for isolating poorly performing queries/plans
that doesn't stem from that.

Balancing all of these concerns, here's a basic outline of what I
think that the way forward is:

* Add a hook to core that allows modules to hook into the call to
parse_analyze() and parse_analyze_varparams(). There'd probably be two
single-line plugin functions for each, so