Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-04-09 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 Having taken another look at the code, I wonder if we wouldn't have
 been better off just fastpathing out of pgss_store in the first call
 (in a pair of calls made by a backend as part an execution of some
 non-prepared query) iff there is already an entry in the hashtable -
 after all, we're now going to the trouble of acquiring the spinlock
 just to increment the usage for the entry by 0 (likewise, every other
 field), which is obviously superfluous. I apologise for not having
 spotted this before submitting my last patch.

On reflection, we can actually make the code a good bit simpler if
we push the responsibility for initializing the usage count correctly
into entry_alloc(), instead of having to fix it up later.  Then we
can just skip the entire adjust-the-stats step in pgss_store when
building a sticky entry.  See my commit just now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-04-08 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 29 March 2012 21:05, Tom Lane t...@sss.pgh.pa.us wrote:
 Barring objections I'll go fix this, and then this patch can be
 considered closed except for possible future tweaking of the
 sticky-entry decay rule.

 Attached patch fixes a bug, and tweaks sticky-entry decay.

Applied with some cosmetic adjustments.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-04-08 Thread Peter Geoghegan
On 8 April 2012 20:51, Tom Lane t...@sss.pgh.pa.us wrote:
 Applied with some cosmetic adjustments.

Thanks.

Having taken another look at the code, I wonder if we wouldn't have
been better off just fastpathing out of pgss_store in the first call
(in a pair of calls made by a backend as part an execution of some
non-prepared query) iff there is already an entry in the hashtable -
after all, we're now going to the trouble of acquiring the spinlock
just to increment the usage for the entry by 0 (likewise, every other
field), which is obviously superfluous. I apologise for not having
spotted this before submitting my last patch.

I have attached a patch with the modifications described.

This is more than a micro-optimisation, since it will cut the number
of spinlock acquisitions approximately in half for non-prepared
queries.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pg_stat_statements_optimization_2012_04_08.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-04-06 Thread Peter Geoghegan
On 29 March 2012 21:05, Tom Lane t...@sss.pgh.pa.us wrote:
 Barring objections I'll go fix this, and then this patch can be
 considered closed except for possible future tweaking of the
 sticky-entry decay rule.

Attached patch fixes a bug, and tweaks sticky-entry decay.

The extant code bumps usage (though not call counts) in two hooks
(pgss_post_parse_analyze() and pgss_ExecutorEnd()) , so prepared
queries will always have about half the usage of an equivalent simple
query, which is clearly not desirable. With the proposed patch,
usage should be similar to calls until the first call of
entry_dealloc(), rather than usually having a value that's about twice
as high. With the patch, a run of pgbench with and without -M
prepared results in a usage of calls + 1 for each query from both
runs.

The approach I've taken with decay is to maintain a server-wide median
usage value (well, a convenient approximation), which is assigned to
sticky entries. This makes it hard to evict the entries in the first
couple of calls to entry_dealloc(). On the other hand, if there really
is contention for entries, it will soon become really easy to evict
sticky entries, because we use a much more aggressive multiplier of
0.5 for their decay.

I rather conservatively initially assume that the median usage is 10,
which is a very low value considering the use of the multiplier trick.
In any case, in the real world it won't take too long to call
entry_dealloc() to set the median value, if in fact it actually
matters.

You described entries as precious. This isn't quite the full picture;
while pg_stat_statements will malthusianistically burn through pretty
much as many entries as you care give to it, or so you might think, I
believe that in the real world, the rate at which the module burns
through them would frequently look logarithmic. In other words, after
an entry_dealloc() call the hashtable is 95% full, but it might take
rather a long time to reach 100% again - the first 5% is consumed
dramatically faster than the last. The user might not actually care if
you need to cache a sticky value for a few hours in one of their
slots, as you run an epic reporting query, even though the hashtable
is over 95% full.

The idea is to avoid evicting a sticky entry just because there
happened to be an infrequent entry_dealloc() at the wrong time, and
the least marginal of the most marginal 5% of non-sticky entries (that
is, the 5% up for eviction) happened to have a call count/usage of
higher than the magic value of 3, which I find quite plausible.

If I apply your test for dead sticky entries after the regression
tests (serial schedule) were run, my approach compares very favourably
(granted, presumably usage values were double-counted for your test,
making our results less than completely comparable).

For the purposes of this experiment, I've just commented out if
(calls == 0) continue; within the pg_stat_statements() function,
obviously:

postgres=# select calls = 0, count(*) from pg_stat_statements() group
by calls = 0;
-[ RECORD 1 ]-
?column? | f
count| 959
-[ RECORD 2 ]-
?column? | t
count| 3  --- this includes the above query itself

postgres=# select calls = 0, count(*) from pg_stat_statements() group
by calls = 0;
-[ RECORD 1 ]-
?column? | f
count| 960   now it's counted here...
-[ RECORD 2 ]-
?column? | t
count| 2    ...not here

I've also attached some elogs, in their original chronological order,
that trace the median usage when recorded at entry_dealloc() for the
regression tests. As you'd expect given that this is the regression
tests, the median is very low, consistently between 1.9 and 2.5. An
additional factor that makes this work well is that the standard
deviation is low, and as such it is much easier to evict sticky
entries, which is what you want here.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.940598
DEBUG:  cur_med_usage: 1.921192
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.882960
DEBUG:  cur_med_usage: 1.882960
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.921192
DEBUG:  cur_med_usage: 1.921192
DEBUG:  cur_med_usage: 1.921192
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage: 

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Robert Haas
On Wed, Mar 28, 2012 at 10:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The SELECT INTO tests all fail, but we know the reason why (the testbed
 isn't expecting them to result in creating separate entries for the
 utility statement and the underlying plannable SELECT).

This might be a dumb idea, but for a quick hack, could we just rig
SELECT INTO, CREATE TABLE AS, and EXPLAIN not to create entries for
themselves at all, without suppressing creation of an entry for the
underlying query?  The output might be slightly misleading but it
wouldn't be broken, and I'm disinclined to want to spend a lot of time
doing fine-tuning right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Peter Geoghegan
On 29 March 2012 02:09, Tom Lane t...@sss.pgh.pa.us wrote:
 Thanks.  I've committed the patch along with the docs, after rather
 heavy editorialization.

Thank you.

 1. What to do with EXPLAIN, SELECT INTO, etc.  We had talked about
 tweaking the behavior of statement nesting and some other possibilities.
 I think clearly this could use improvement but I'm not sure just how.
 (Note: I left out the part of your docs patch that attempted to explain
 the current behavior, since I think we should fix it not document it.)

Yeah, this is kind of unsatisfactory. Nobody would expect the module
to behave this way. On the other hand, users probably aren't hugely
interested in this information.

I'm still kind of attached to the idea of exposing the hash value in
the view. It could be handy in replication situations, to be able to
aggregate statistics across the cluster (assuming you're using
streaming replication and not a trigger based system). You need a
relatively stable identifier to be able to do that. You've already
sort-of promised to not break the format in point releases, because it
is serialised to disk, and may have to persist for months or years.
Also, it will drive home the reality of what's going on in situations
like this (from the docs):

In some cases, queries with visibly different texts might get merged
into a single pg_stat_statements entry. Normally this will happen only
for semantically equivalent queries, but there is a small chance of
hash collisions causing unrelated queries to be merged into one entry.
(This cannot happen for queries belonging to different users or
databases, however.)

Since the hash value is computed on the post-parse-analysis
representation of the queries, the opposite is also possible: queries
with identical texts might appear as separate entries, if they have
different meanings as a result of factors such as different
search_path settings.

 2. Whether and how to adjust the aging-out of sticky entries.  This
 seems like a research project, but the code impact should be quite
 localized.

As I said, I'll try and give it some thought, and do some experiments.

 BTW, I eventually concluded that the parameterization testing we were
 worried about before was a red herring.  As committed, the patch tries
 to store a normalized string if it found any deletable constants, full
 stop.  This seems to me to be correct behavior because the presence of
 constants is exactly what makes the string normalizable, and such
 constants *will* be ignored in the hash calculation no matter whether
 there are other parameters or not.

Yeah, that aspect of not canonicalising parametrised entries had
bothered me. I guess we're better off gracefully handling the problem
across the board, rather than attempting to band-aid the problem up by
just not having speculative hashtable entries in cases where they
arguably are not so useful. Avoiding canonicalising those constants
was somewhat misleading.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Mar 28, 2012 at 10:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The SELECT INTO tests all fail, but we know the reason why (the testbed
 isn't expecting them to result in creating separate entries for the
 utility statement and the underlying plannable SELECT).

 This might be a dumb idea, but for a quick hack, could we just rig
 SELECT INTO, CREATE TABLE AS, and EXPLAIN not to create entries for
 themselves at all, without suppressing creation of an entry for the
 underlying query?

It would make more sense to me to go the other way, that is suppress
creation of a separate entry for the contained optimizable statement.
The stats will still be correctly accumulated into the surrounding
statement (or at least, if they are not, that's a separate pre-existing
bug).  If we do it in the direction you suggest, we'll fail to capture
costs incurred outside execution of the contained statement.

Right now, we already have logic in there to track nesting of statements
in a primitive way, that is just count the nesting depth.  My first idea
about fixing this was to tweak that logic so that it stacks a flag
saying we're in a utility statement that contains an optimizable
statement, and then the first layer of Executor hooks that sees that
flag set would know to not do anything.  However this isn't quite good
enough because that first layer might not be for the same statement.
As an example, in an EXPLAIN ANALYZE the planner might pre-execute
immutable or stable SQL functions before we reach the executor.  We
would prefer that any statements embedded in such a function still be
seen as independent nested statements.

However, I think there is a solution for that, though it may sound a bit
ugly.  Rather than just stacking a flag, let's stack the query source
text pointer for the utility statement.  Then in the executor hooks,
if that pointer is *pointer* equal (not strcmp equal) to the optimizable
statement's source-text pointer, we know we are executing the same
statement as the surrounding utility command, and we do nothing.

This looks like it would work for the SELECT INTO and EXPLAIN cases,
and for DECLARE CURSOR whenever that gets changed to a less bizarre
structure.  It would not work for EXECUTE, because in that case we
pass the query string saved from PREPARE to the executor.  However,
we could possibly do a special-case hack for EXECUTE; maybe ask
prepare.c for the statement's string and stack that instead of the
outer EXECUTE query string.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Robert Haas
On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It would make more sense to me to go the other way, that is suppress
 creation of a separate entry for the contained optimizable statement.
 The stats will still be correctly accumulated into the surrounding
 statement (or at least, if they are not, that's a separate pre-existing
 bug).  If we do it in the direction you suggest, we'll fail to capture
 costs incurred outside execution of the contained statement.

All things being equal, I completely agree.  However, ISTM that the
difficulty of implementation might be higher for your proposal, for
the reasons you go on to state.  If getting it right means that other
significant features are not going to get committed at all for 9.2, I
think we could leave this as a TODO.

 Right now, we already have logic in there to track nesting of statements
 in a primitive way, that is just count the nesting depth.  My first idea
 about fixing this was to tweak that logic so that it stacks a flag
 saying we're in a utility statement that contains an optimizable
 statement, and then the first layer of Executor hooks that sees that
 flag set would know to not do anything.  However this isn't quite good
 enough because that first layer might not be for the same statement.
 As an example, in an EXPLAIN ANALYZE the planner might pre-execute
 immutable or stable SQL functions before we reach the executor.  We
 would prefer that any statements embedded in such a function still be
 seen as independent nested statements.

 However, I think there is a solution for that, though it may sound a bit
 ugly.  Rather than just stacking a flag, let's stack the query source
 text pointer for the utility statement.  Then in the executor hooks,
 if that pointer is *pointer* equal (not strcmp equal) to the optimizable
 statement's source-text pointer, we know we are executing the same
 statement as the surrounding utility command, and we do nothing.

Without wishing to tick you off, that sounds both ugly and fragile.
Can't we find a way to set the stacked flag (on the top stack frame)
after planning and before execution?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, I think there is a solution for that, though it may sound a bit
 ugly.  Rather than just stacking a flag, let's stack the query source
 text pointer for the utility statement.  Then in the executor hooks,
 if that pointer is *pointer* equal (not strcmp equal) to the optimizable
 statement's source-text pointer, we know we are executing the same
 statement as the surrounding utility command, and we do nothing.

 Without wishing to tick you off, that sounds both ugly and fragile.

What do you object to --- the pointer-equality part?  We could do strcmp
comparison instead, on the assumption that a utility command could not
look the same as an optimizable statement except in the case we care
about.  I think that's probably unnecessary though.

 Can't we find a way to set the stacked flag (on the top stack frame)
 after planning and before execution?

That would require a way for pg_stat_statements to get control at rather
random places in several different types of utility statements.  And
if we did add hook functions in those places, you'd still need to have
sufficient stacked context for those hooks to know what to do, which
leads you right back to this I think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Robert Haas
On Thu, Mar 29, 2012 at 11:42 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 29, 2012 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, I think there is a solution for that, though it may sound a bit
 ugly.  Rather than just stacking a flag, let's stack the query source
 text pointer for the utility statement.  Then in the executor hooks,
 if that pointer is *pointer* equal (not strcmp equal) to the optimizable
 statement's source-text pointer, we know we are executing the same
 statement as the surrounding utility command, and we do nothing.

 Without wishing to tick you off, that sounds both ugly and fragile.

 What do you object to --- the pointer-equality part?  We could do strcmp
 comparison instead, on the assumption that a utility command could not
 look the same as an optimizable statement except in the case we care
 about.  I think that's probably unnecessary though.

The pointer equality part seems like the worst ugliness, yes.

 Can't we find a way to set the stacked flag (on the top stack frame)
 after planning and before execution?

 That would require a way for pg_stat_statements to get control at rather
 random places in several different types of utility statements.  And
 if we did add hook functions in those places, you'd still need to have
 sufficient stacked context for those hooks to know what to do, which
 leads you right back to this I think.

What I'm imagining is that instead of just having a global for
nested_level, you'd have a global variable pointing to a linked list.
The length of the list would be equal to what we currently call
nested_level + 1.  Something like this:

struct pgss_nesting_info
{
struct pgss_nesting_info *next;
int flag;  /* bad name */
};
static pgss_nesting_info *pgss_stack_top;

So any test for nesting_depth == 0 would instead test
pgss_stack_top-next != NULL.

Then, when you get control at the relevant spots, you set
pgss_stack_top-flag = 1 and that's it.  Now, maybe it's too ugly to
think about passing control at those spots; I'm surprised there's not
a central point they all go through...

Another thought is: if we simply treated these as nested queries for
all purposes, would that really be so bad?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 What I'm imagining is that instead of just having a global for
 nested_level, you'd have a global variable pointing to a linked list.

This is more or less what I have in mind, too, except I do not believe
that a mere boolean flag is sufficient to tell the difference between
an executor call that you want to suppress logging for and one that
you do not.  You need some more positive way of identifying the target
statement than that, and what I propose that be is the statement's query
string.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Tom Lane
[ forgot to respond to this bit ]

Robert Haas robertmh...@gmail.com writes:
 Another thought is: if we simply treated these as nested queries for
 all purposes, would that really be so bad?

That was actually what I suggested first, and now that I look at the
code, that's exactly what's happening right now.  However, Peter pointed
out --- I think rightly --- that this fails to satisfy the principle
of least astonishment, because the user doesn't think of these
statements as being utility statements with other statements wrapped
inside.  Users are going to expect these cases to be treated as single
statements.

The issue existed before this patch, BTW, but was partially masked by
the fact that we grouped pg_stat_statements view entries strictly by
query text, so that both the utility statement and the contained
optimizable statement got matched to the same table entry.  The
execution costs were getting double-counted, but apparently nobody
noticed that.  As things now stand, the utility statement and contained
statement show up as distinct table entries (if you have
nested-statement tracking enabled) because of the different hashing
methods used.  And it's those multiple table entries that seem
confusing, especially since they are counting mostly the same costs.

[ time passes ... ]

Hm ... I just had a different idea.  I need to go look at the code
again, but I believe that in the problematic cases, the post-analyze
hook does not compute a queryId for the optimizable statement.  This
means that it will arrive at the executor with queryId zero.  What if
we simply made the executor hooks do nothing when queryId is zero?
(Note that this also means that in the problematic cases, the behavior
is already pretty wrong because executor costs for *all* statements of
this sort are getting merged into one hashtable entry for hash zero.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Tom Lane
I wrote:
 Hm ... I just had a different idea.  I need to go look at the code
 again, but I believe that in the problematic cases, the post-analyze
 hook does not compute a queryId for the optimizable statement.  This
 means that it will arrive at the executor with queryId zero.  What if
 we simply made the executor hooks do nothing when queryId is zero?
 (Note that this also means that in the problematic cases, the behavior
 is already pretty wrong because executor costs for *all* statements of
 this sort are getting merged into one hashtable entry for hash zero.)

The attached proposed patch does it that way.  It makes the EXPLAIN,
SELECT INTO, and DECLARE CURSOR cases behave as expected for utility
statements.  PREPARE/EXECUTE work a bit funny though: if you have
track = all then you get EXECUTE cycles reported against both the
EXECUTE statement and the underlying PREPARE.  This is because when
PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know
that this isn't a top-level statement, so it marks the query with a
queryId.  I don't see any way around that part without something like
what I suggested before.  However, this behavior seems to me to be
considerably less of a POLA violation than the cases involving two
identical-looking entries for self-contained statements, and it might
even be thought to be a feature not a bug (since the PREPARE entry will
accumulate totals for all uses of the prepared statement).  So I'm
satisfied with it for now.

regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 137b242..bebfff1 100644
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
*** pgss_post_parse_analyze(ParseState *psta
*** 602,610 
  	if (!pgss || !pgss_hash)
  		return;
  
! 	/* We do nothing with utility statements at this stage */
  	if (query-utilityStmt)
  		return;
  
  	/* Set up workspace for query jumbling */
  	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
--- 602,620 
  	if (!pgss || !pgss_hash)
  		return;
  
! 	/*
! 	 * Utility statements get queryId zero.  We do this even in cases where
! 	 * the statement contains an optimizable statement for which a queryId
! 	 * could be derived (such as EXPLAIN or DECLARE CURSOR).  For such cases,
! 	 * runtime control will first go through ProcessUtility and then the
! 	 * executor, and we don't want the executor hooks to do anything, since
! 	 * we are already measuring the statement's costs at the utility level.
! 	 */
  	if (query-utilityStmt)
+ 	{
+ 		query-queryId = 0;
  		return;
+ 	}
  
  	/* Set up workspace for query jumbling */
  	jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
*** pgss_post_parse_analyze(ParseState *psta
*** 619,624 
--- 629,641 
  	query-queryId = hash_any(jstate.jumble, jstate.jumble_len);
  
  	/*
+ 	 * If we are unlucky enough to get a hash of zero, use 1 instead, to
+ 	 * prevent confusion with the utility-statement case.
+ 	 */
+ 	if (query-queryId == 0)
+ 		query-queryId = 1;
+ 
+ 	/*
  	 * If we were able to identify any ignorable constants, we immediately
  	 * create a hash table entry for the query, so that we can record the
  	 * normalized form of the query string.  If there were no such constants,
*** pgss_ExecutorStart(QueryDesc *queryDesc,
*** 649,655 
  	else
  		standard_ExecutorStart(queryDesc, eflags);
  
! 	if (pgss_enabled())
  	{
  		/*
  		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
--- 666,677 
  	else
  		standard_ExecutorStart(queryDesc, eflags);
  
! 	/*
! 	 * If query has queryId zero, don't track it.  This prevents double
! 	 * counting of optimizable statements that are directly contained in
! 	 * utility statements.
! 	 */
! 	if (pgss_enabled()  queryDesc-plannedstmt-queryId != 0)
  	{
  		/*
  		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
*** pgss_ExecutorFinish(QueryDesc *queryDesc
*** 719,731 
  static void
  pgss_ExecutorEnd(QueryDesc *queryDesc)
  {
! 	if (queryDesc-totaltime  pgss_enabled())
! 	{
! 		uint32		queryId;
! 
! 		/* Query's ID should have been filled in by post-analyze hook */
! 		queryId = queryDesc-plannedstmt-queryId;
  
  		/*
  		 * Make sure stats accumulation is done.  (Note: it's okay if several
  		 * levels of hook all do this.)
--- 741,750 
  static void
  pgss_ExecutorEnd(QueryDesc *queryDesc)
  {
! 	uint32		queryId = queryDesc-plannedstmt-queryId;
  
+ 	if (queryId != 0  queryDesc-totaltime  pgss_enabled())
+ 	{
  		/*
  		 * Make sure stats accumulation is done.  (Note: it's okay if several
  		 * levels of hook all do this.)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Tom Lane
I wrote:
 ... PREPARE/EXECUTE work a bit funny though: if you have
 track = all then you get EXECUTE cycles reported against both the
 EXECUTE statement and the underlying PREPARE.  This is because when
 PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know
 that this isn't a top-level statement, so it marks the query with a
 queryId.  I don't see any way around that part without something like
 what I suggested before.  However, this behavior seems to me to be
 considerably less of a POLA violation than the cases involving two
 identical-looking entries for self-contained statements, and it might
 even be thought to be a feature not a bug (since the PREPARE entry will
 accumulate totals for all uses of the prepared statement).  So I'm
 satisfied with it for now.

Actually, there's an easy hack for that too: we can teach the
ProcessUtility hook to do nothing (and in particular not increment the
nesting level) when the statement is an ExecuteStmt.  This will result
in the executor time being blamed on the original PREPARE, whether or
not you have enabled tracking of nested statements.  That seems like a
substantial win to me, because right now you get a distinct EXECUTE
entry for each textually-different set of parameter values, which seems
pretty useless.  This change would make use of PREPARE/EXECUTE behave
very nearly the same in pg_stat_statement as use of protocol-level
prepared statements.  About the only downside I can see is that the
cycles expended on evaluating the EXECUTE's parameters will not be
charged to any pg_stat_statement entry.  Since those can be expressions,
in principle this might be a non-negligible amount of execution time,
but in practice it hardly seems likely that anyone would care about it.

Barring objections I'll go fix this, and then this patch can be
considered closed except for possible future tweaking of the
sticky-entry decay rule.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Robert Haas
On Thu, Mar 29, 2012 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 ... PREPARE/EXECUTE work a bit funny though: if you have
 track = all then you get EXECUTE cycles reported against both the
 EXECUTE statement and the underlying PREPARE.  This is because when
 PREPARE calls parse_analyze_varparams the post-analyze hook doesn't know
 that this isn't a top-level statement, so it marks the query with a
 queryId.  I don't see any way around that part without something like
 what I suggested before.  However, this behavior seems to me to be
 considerably less of a POLA violation than the cases involving two
 identical-looking entries for self-contained statements, and it might
 even be thought to be a feature not a bug (since the PREPARE entry will
 accumulate totals for all uses of the prepared statement).  So I'm
 satisfied with it for now.

 Actually, there's an easy hack for that too: we can teach the
 ProcessUtility hook to do nothing (and in particular not increment the
 nesting level) when the statement is an ExecuteStmt.  This will result
 in the executor time being blamed on the original PREPARE, whether or
 not you have enabled tracking of nested statements.  That seems like a
 substantial win to me, because right now you get a distinct EXECUTE
 entry for each textually-different set of parameter values, which seems
 pretty useless.  This change would make use of PREPARE/EXECUTE behave
 very nearly the same in pg_stat_statement as use of protocol-level
 prepared statements.  About the only downside I can see is that the
 cycles expended on evaluating the EXECUTE's parameters will not be
 charged to any pg_stat_statement entry.  Since those can be expressions,
 in principle this might be a non-negligible amount of execution time,
 but in practice it hardly seems likely that anyone would care about it.

 Barring objections I'll go fix this, and then this patch can be
 considered closed except for possible future tweaking of the
 sticky-entry decay rule.

After reading your last commit message, I was wondering if something
like this might be possible, so +1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan
[ also for the archives' sake ]

On 27 March 2012 22:05, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, testing function pointers for null is certainly OK --- note that
 all our hook function call sites do that.  It's true that testing for
 equality to a particular function's name can fail on some platforms
 because of jump table hacks.

I was actually talking about stylistic iffiness. This seems contrary
to the standard, which states:

 (ISO C 99 section 6.5.9.6)
   Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (...) or function,
both are pointers to one past the last element of the same array object,
or one is a pointer to one past the end of one array object and the
other is a pointer to the start of a different array object that happens
to immediately follow the first array object in the address space.

However, the fly in the ointment is IA-64 (Itanic), which apparently
at least at one stage had broken function pointer comparisons, at
least when code was built using some version(s) of GCC.

I found it a bit difficult to square your contention that performing
function pointer comparisons against function addresses was what
sounded like undefined behaviour, and yet neither GCC nor Clang
complained. However, in light of what I've learned about IA-64, I can
certainly see why we as a project would avoid the practice.

Source: http://gcc.gnu.org/ml/gcc/2003-06/msg01283.html

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan
On 27 March 2012 20:26, Tom Lane t...@sss.pgh.pa.us wrote:
 Have yet to look at the pg_stat_statements code itself.

I merged upstream changes with the intention of providing a new patch
for you to review. I found a problem that I'd guess was introduced by
commit 9dbf2b7d75de5af38d087cbe2b1147dd0fd10f0a, Restructure SELECT
INTO's parsetree representation into CreateTableAsStmt. This has
nothing to do with my patch in particular.

I noticed that my tests broke, on queries like select * into
orders_recent FROM orders WHERE orderdate = '2002-01-01';. Since
commands like that are now utility statements, I thought it best to
just hash the query string directly, along with all other utility
statements - richer functionality would be unlikely to be missed all
that much, and that's a fairly clean demarcation that I don't want to
make blurry.

In the existing pg_stat_statements code in HEAD, there are 2
pgss_store call sites - one in pgss_ProcessUtility, and the other in
pgss_ExecutorFinish. There is an implicit assumption in the extant
code (and my patch too) that there will be exactly one pgss_store call
per query execution. However, that assumption appears to now fall
down, as illustrated by the GDB session below. What's more, our new
hook is called twice, which is arguably redundant.

(gdb) break pgss_parse_analyze
Breakpoint 1 at 0x7fbd17b96790: file pg_stat_statements.c, line 640.
(gdb) break pgss_ProcessUtility
Breakpoint 2 at 0x7fbd17b962b4: file pg_stat_statements.c, line 1710.
(gdb) break pgss_ExecutorEnd
Breakpoint 3 at 0x7fbd17b9618c: file pg_stat_statements.c, line 1674.
(gdb) c
Continuing.

 I execute the command select * into orders_recent FROM orders WHERE
orderdate = '2002-01-01'; 

Breakpoint 1, pgss_parse_analyze (pstate=0x2473bc0,
post_analysis_tree=0x2474010) at pg_stat_statements.c:640
640 if (post_analysis_tree-commandType != CMD_UTILITY)
(gdb) c
Continuing.

Breakpoint 1, pgss_parse_analyze (pstate=0x2473bc0,
post_analysis_tree=0x2474010) at pg_stat_statements.c:640
640 if (post_analysis_tree-commandType != CMD_UTILITY)
(gdb) c
Continuing.

Breakpoint 2, pgss_ProcessUtility (parsetree=0x2473cd8,
queryString=0x2472a88 select * into orders_recent FROM orders WHERE
orderdate = '2002-01-01';, params=0x0,
isTopLevel=1 '\001', dest=0x2474278, completionTag=0x7fff74e481e0
) at pg_stat_statements.c:1710
1710if (pgss_track_utility  pgss_enabled())
(gdb) c
Continuing.

Breakpoint 3, pgss_ExecutorEnd (queryDesc=0x24c9660) at
pg_stat_statements.c:1674
1674if (queryDesc-totaltime  pgss_enabled())
(gdb) c
Continuing.

What do you think we should do about this?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 I merged upstream changes with the intention of providing a new patch
 for you to review. I found a problem that I'd guess was introduced by
 commit 9dbf2b7d75de5af38d087cbe2b1147dd0fd10f0a, Restructure SELECT
 INTO's parsetree representation into CreateTableAsStmt. This has
 nothing to do with my patch in particular.

Yeah, I already deleted the intoClause chunk from the patch.  I think
treating SELECT INTO as a utility statement is probably fine, at least
for now.

 In the existing pg_stat_statements code in HEAD, there are 2
 pgss_store call sites - one in pgss_ProcessUtility, and the other in
 pgss_ExecutorFinish. There is an implicit assumption in the extant
 code (and my patch too) that there will be exactly one pgss_store call
 per query execution. However, that assumption appears to now fall
 down, as illustrated by the GDB session below. What's more, our new
 hook is called twice, which is arguably redundant.

That's been an issue right along for cases such as EXPLAIN and EXECUTE,
I believe.  Perhaps the right thing is to consider such executor calls
as nested statements --- that is, the ProcessUtility hook ought to
bump the nesting depth too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan
On 28 March 2012 15:25, Tom Lane t...@sss.pgh.pa.us wrote:
 That's been an issue right along for cases such as EXPLAIN and EXECUTE,
 I believe.

Possible, since I didn't have test coverage for either of those 2 commands.

 Perhaps the right thing is to consider such executor calls
 as nested statements --- that is, the ProcessUtility hook ought to
 bump the nesting depth too.

That makes a lot of sense, but it might spoil things for the
pg_stat_statements.track = 'top' + pg_stat_statements.track_utility =
'on' case. At the very least, it's a POLA violation, to the extent
that if you were going to do this, you might mandate that nested
statements be tracked along with utility statements (probably while
defaulting to having both off, which would be a change).

Since you've already removed the intoClause chunk, I'm not sure how
far underway the review effort is - would you like me to produce a new
revision, or is that unnecessary?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Tom Lane
A couple other issues about this patch ...

Is there any actual benefit in providing the
pg_stat_statements.string_key GUC?  It looks to me more like something
that was thrown in because it was easy than because anybody would want
it.  I'd just as soon leave it out and avoid the incremental API
complexity increase.  (While on that subject, I see no documentation
updates in the patch...)

Also, I'm not terribly happy with the sticky entries hack.  The way
you had it set up with a 1e10 bias for a sticky entry was completely
unreasonable IMO, because if the later pgss_store call never happens
(which is quite possible if the statement contains an error detected
during planning or execution), that entry is basically never going to
age out, and will just uselessly consume a precious table slot for a
long time.  In the extreme, somebody could effectively disable query
tracking by filling the hashtable with variants of SELECT 1/0.
The best quick answer I can think of is to reduce USAGE_NON_EXEC_STICK
to maybe 10 or so, but I wonder whether there's some less klugy way to
get the result in the first place.  I thought about keeping the
canonicalized query string around locally in the backend rather than
having the early pgss_store call at all, but am not sure it's worth
the complexity of an additional local hashtable or some such to hold
such pending entries.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan
On 28 March 2012 15:57, Tom Lane t...@sss.pgh.pa.us wrote:
 Is there any actual benefit in providing the
 pg_stat_statements.string_key GUC?  It looks to me more like something
 that was thrown in because it was easy than because anybody would want
 it.  I'd just as soon leave it out and avoid the incremental API
 complexity increase.  (While on that subject, I see no documentation
 updates in the patch...)

Personally, I don't care for it, and I'm sure most users wouldn't
either, but I thought that someone somewhere might be relying on the
existing behaviour.

I will produce a doc-patch. It would have been premature to do so
until quite recently.

 Also, I'm not terribly happy with the sticky entries hack.  The way
 you had it set up with a 1e10 bias for a sticky entry was completely
 unreasonable IMO, because if the later pgss_store call never happens
 (which is quite possible if the statement contains an error detected
 during planning or execution), that entry is basically never going to
 age out, and will just uselessly consume a precious table slot for a
 long time.  In the extreme, somebody could effectively disable query
 tracking by filling the hashtable with variants of SELECT 1/0.
 The best quick answer I can think of is to reduce USAGE_NON_EXEC_STICK
 to maybe 10 or so, but I wonder whether there's some less klugy way to
 get the result in the first place.  I thought about keeping the
 canonicalized query string around locally in the backend rather than
 having the early pgss_store call at all, but am not sure it's worth
 the complexity of an additional local hashtable or some such to hold
 such pending entries.

I was troubled by that too, and had considered various ways of at
least polishing the kludge. Maybe a better approach would be to start
with a usage of 1e10 (or something rather high, anyway), but apply a
much more aggressive multiplier than USAGE_DECREASE_FACTOR for sticky
entries only? That way, in earlier calls of entry_dealloc() the sticky
entries, easily identifiable as having 0 calls, are almost impossible
to evict, but after a relatively small number of calls they soon
become more readily evictable.

This seems better than simply having some much lower usage that is
only a few times the value of USAGE_INIT.

Let's suppose we set sticky entries to have a usage value of 10. If
all other queries have more than 10 calls, which is not unlikely
(under the current usage format, 1.0 usage = 1 call, at least until
entry_dealloc() dampens usage) then when we entry_dealloc(), the
sticky entry might as well have a usage of 1, and has no way of
increasing its usage short of becoming a real entry.

On the other hand, with the multiplier trick, how close the sticky
entry is to eviction is, importantly, far more strongly influenced by
the number of entry_dealloc() calls, which in turn is influenced by
churn in the system, rather than being largely influenced by how the
magic sticky usage value happens to compare to those usage values of
some random set of real entries on some random database. If entries
really are precious, then the sticky entry is freed much sooner. If
not, then why not allow the sticky entry to stick around pending its
execution/ promotion to a real entry?

It would probably be pretty inexpensive to maintain what is currently
the largest usage value in the hash table at entry_dealloc() time -
that would likely be far more suitable than 1e10, and might even work
well. We could perhaps cut that in half every entry_dealloc().

--
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 28 March 2012 15:57, Tom Lane t...@sss.pgh.pa.us wrote:
 Is there any actual benefit in providing the
 pg_stat_statements.string_key GUC?  It looks to me more like something
 that was thrown in because it was easy than because anybody would want
 it.  I'd just as soon leave it out and avoid the incremental API
 complexity increase.  (While on that subject, I see no documentation
 updates in the patch...)

 Personally, I don't care for it, and I'm sure most users wouldn't
 either, but I thought that someone somewhere might be relying on the
 existing behaviour.

Hearing no squawks, I will remove it from the committed patch; one
less thing to document.  Easy enough to put back later, if someone
makes a case for it.

 Also, I'm not terribly happy with the sticky entries hack.

 I was troubled by that too, and had considered various ways of at
 least polishing the kludge. Maybe a better approach would be to start
 with a usage of 1e10 (or something rather high, anyway), but apply a
 much more aggressive multiplier than USAGE_DECREASE_FACTOR for sticky
 entries only? That way, in earlier calls of entry_dealloc() the sticky
 entries, easily identifiable as having 0 calls, are almost impossible
 to evict, but after a relatively small number of calls they soon
 become more readily evictable.

I did some simple experiments with the regression tests.  Now, those
tests are by far a worst case for this sort of thing, since (a) they
probably generate many more unique queries than a typical production
application, and (b) they almost certainly provoke many more errors
and hence more dead sticky entries than a typical production app.
Nonetheless, the results look pretty bad.  Using various values of
USAGE_NON_EXEC_STICK, the numbers of useful and dead entries in the hash
table after completing one round of regression tests was:

STICK   live entriesdead sticky entries

10.0780 190
5.0 858 112
4.0 874 96
3.0 911 62
2.0 918 43

I did not bother measuring 1e10 ;-).  It's clear that sticky entries
are forcing useful entries out of the table in this scenario.
I think wasting more than about 10% of the table in this way is not
acceptable.

I'm planning to commit the patch with a USAGE_NON_EXEC_STICK value
of 3.0, which is the largest value that stays below 10% wastage.
We can twiddle that logic later, so if you want to experiment with an
alternate decay rule, feel free.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan
On 29 March 2012 00:14, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm planning to commit the patch with a USAGE_NON_EXEC_STICK value
 of 3.0, which is the largest value that stays below 10% wastage.
 We can twiddle that logic later, so if you want to experiment with an
 alternate decay rule, feel free.

I think I may well end up doing so when I get a chance. This seems
like the kind of problem that will be solved only when we get some
practical experience (i.e. use the tool on something closer to a
production system than the regression tests).

doc-patch is attached. I'm not sure if I got the balance right - it
may be on the verbose side.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pg_stat_statements_norm_docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 doc-patch is attached. I'm not sure if I got the balance right - it
 may be on the verbose side.

Thanks.  I've committed the patch along with the docs, after rather
heavy editorialization.

There remain some loose ends that should be worked on but didn't seem
like commit-blockers:

1. What to do with EXPLAIN, SELECT INTO, etc.  We had talked about
tweaking the behavior of statement nesting and some other possibilities.
I think clearly this could use improvement but I'm not sure just how.
(Note: I left out the part of your docs patch that attempted to explain
the current behavior, since I think we should fix it not document it.)

2. Whether and how to adjust the aging-out of sticky entries.  This
seems like a research project, but the code impact should be quite
localized.

BTW, I eventually concluded that the parameterization testing we were
worried about before was a red herring.  As committed, the patch tries
to store a normalized string if it found any deletable constants, full
stop.  This seems to me to be correct behavior because the presence of
constants is exactly what makes the string normalizable, and such
constants *will* be ignored in the hash calculation no matter whether
there are other parameters or not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Tom Lane
BTW, I forgot to mention that I did experiment with your python-based
test script for pg_stat_statements, but decided not to commit it.
There are just too many external dependencies for my taste:

1. python
2. psycopg2
3. dellstore2 test database

That coupled with the apparent impossibility of running the script
without manual preconfiguration makes it look not terribly useful.

Also, as of the committed patch there are several individual tests that
fail or need adjustment:


The SELECT INTO tests all fail, but we know the reason why (the testbed
isn't expecting them to result in creating separate entries for the
utility statement and the underlying plannable SELECT).


verify_statement_equivalency(select a.orderid from orders a join 
orders b on a.orderid = b.orderid,
 select b.orderid from orders a join orders b 
on a.orderid = b.orderid, conn)

These are not equivalent statements, or at least would not be if the
join condition were anything else than what it is, so the fact that the
original coding failed to distinguish the targetlist entries is a bug.


The test
# temporary column name within recursive CTEs doesn't differentiate
fails, not because of the change of column name, but because of the
change of CTE name.  This is a consequence of my having used the CTE
name here:

case RTE_CTE:

/*
 * Depending on the CTE name here isn't ideal, but it's the
 * only info we have to identify the referenced WITH item.
 */
APP_JUMB_STRING(rte-ctename);
APP_JUMB(rte-ctelevelsup);
break;

We could avoid the name dependency by omitting ctename from the jumble
but I think that cure is worse than the disease.


Anyway, not too important, but I just thought I'd document this in case
you were wondering about the discrepancies.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-27 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 22 March 2012 17:19, Tom Lane t...@sss.pgh.pa.us wrote:
 Either way, I think we'd be a lot better advised to define a single
 hook post_parse_analysis_hook and make the core code responsible
 for calling it at the appropriate places, rather than supposing that
 the contrib module knows exactly which core functions ought to be
 the places to do it.

 I have done this too.

The canonicalize argument to the proposed hook seems like a bit of a
crock.  You've got the core code magically setting that in a way that
responds to extremely pg_stat_statements-specific concerns, and I am not
very sure it's right even for those concerns.

I am thinking that perhaps a reasonable signature for the hook function
would be

void post_parse_analyze (ParseState *pstate, Query *query);

with the expectation that it could dig whatever it wants to know out
of the ParseState (in particular the sourceText is available there,
and in general this should provide everything that's known at parse
time).

Now, if what it wants to know about is the parameterization status
of the query, things aren't ideal because most of the info is hidden
in parse-callback fields that aren't of globally exposed types.  However
we could at least duplicate the behavior you have here, because you're
only passing canonicalize = true in cases where no parse callback will
be registered at all, so pg_stat_statements could equivalently test for
pstate-p_paramref_hook == NULL.

Thoughts, other ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-27 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 I've attached a patch with the required modifications.

I've committed the core-backend parts of this, just to get them out of
the way.  Have yet to look at the pg_stat_statements code itself.

 I restored the location field to the ParamCoerceHook signature, but
 the removal of code to modify the param location remains (again, not
 because I need it, but because I happen to think that it ought to be
 consistent with Const).

I ended up choosing not to apply that bit.  I remain of the opinion that
this behavior is fundamentally inconsistent with the general rules for
assigning parse locations to analyzed constructs, and I see no reason to
propagate that inconsistency further than we absolutely have to.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-27 Thread Peter Geoghegan
On 27 March 2012 20:26, Tom Lane t...@sss.pgh.pa.us wrote:
 I've committed the core-backend parts of this, just to get them out of
 the way.  Have yet to look at the pg_stat_statements code itself.

Thanks. I'm glad that we have that out of the way.

 I ended up choosing not to apply that bit.  I remain of the opinion that
 this behavior is fundamentally inconsistent with the general rules for
 assigning parse locations to analyzed constructs, and I see no reason to
 propagate that inconsistency further than we absolutely have to.

Fair enough.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-27 Thread Tom Lane
[ just for the archives' sake ]

Peter Geoghegan pe...@2ndquadrant.com writes:
 On 27 March 2012 18:15, Tom Lane t...@sss.pgh.pa.us wrote:
 Now, if what it wants to know about is the parameterization status
 of the query, things aren't ideal because most of the info is hidden
 in parse-callback fields that aren't of globally exposed types.  However
 we could at least duplicate the behavior you have here, because you're
 only passing canonicalize = true in cases where no parse callback will
 be registered at all, so pg_stat_statements could equivalently test for
 pstate-p_paramref_hook == NULL.

 It has been suggested to me before that comparisons with function
 pointers - using them as a flag, in effect - is generally iffy, but
 that particular usage seems reasonable to me.

Well, testing function pointers for null is certainly OK --- note that
all our hook function call sites do that.  It's true that testing for
equality to a particular function's name can fail on some platforms
because of jump table hacks.  Thus for example, if you had a need to
know that parse_variable_parameters parameter management was in use,
it wouldn't do to test whether p_coerce_param_hook ==
variable_coerce_param_hook.  (Not that you could anyway, what with that
being a static function, but exposing it as global wouldn't offer a safe
solution.)

If we had a need to make this information available, I think what we'd
want to do is insist that p_ref_hook_state entries be subclasses of
Node, so that plugins could apply IsA tests on the node tag to figure
out what style of parameter management was in use.  This would also mean
exposing the struct definitions globally, which you'd need anyway else
the plugins couldn't safely access the struct contents.

I don't particularly want to go there without very compelling reasons,
but that would be the direction to head in if we had to.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-22 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 [ pg_stat_statements_norm_2012_02_29.patch ]

I started to look at this patch (just the core-code modifications so
far).  There are some things that seem not terribly well thought out:

* It doesn't look to me like it will behave very sanely with rules.
The patch doesn't store queryId in a stored rule tree, so a Query
retrieved from a stored rule will have a zero queryId, and that's
what will get pushed through to the resulting plan tree as well.
So basically all DO ALSO or DO INSTEAD operations are going to get
lumped together by pg_stat_statements, and separated from the queries
that triggered them, which seems pretty darn unhelpful.

I don't know that storing queryId would be better, since after a restart
that'd mean there are query IDs running around in the system that the
current instance of pg_stat_statements has never heard of.  Permanently
stored query IDs would also be a headache if you needed to change the
fingerprint algorithm, or if there were more than one add-on trying to
use the query ID support.

I'm inclined to think that the most useful behavior is to teach the
rewriter to copy queryId from the original query into all the Queries
generated by rewrite.  Then, all rules fired by a source query would
be lumped into that query for tracking purposes.  This might not be
the ideal behavior either, but I don't see a better solution.

* The patch injects the query ID calculation code by redefining
parse_analyze and parse_analyze_varparams as hookable functions and
then getting into those hooks.  I don't find this terribly sane either.
pg_stat_statements has no interest in the distinction between those two
methods of getting into parse analysis.  Perhaps more to the point,
those are not the only two ways of getting into parse analysis: some
places call transformTopLevelStmt directly, for instance
pg_analyze_and_rewrite_params.  While it might be that the code paths
that do that are not of interest for fingerprinting queries, it's far
from obvious that these two are the correct and only places to do such
fingerprinting.

I think that if we are going to take the attitude that we only care
about fingerprinting queries that come in from the client, then we
ought to call the fingerprinting code in the client-message-processing
routines in postgres.c.  But in that case we need to be a little clearer
about what we are doing with unfingerprinted queries.  Alternatively,
we might take the position that we want to fingerprint every Query
struct, but in that case the existing hooks are clearly insufficient.
This seems to boil down to what you want to have happen with queries
created/executed inside functions, which is something I don't recall
being discussed.

Either way, I think we'd be a lot better advised to define a single
hook post_parse_analysis_hook and make the core code responsible
for calling it at the appropriate places, rather than supposing that
the contrib module knows exactly which core functions ought to be
the places to do it.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-22 Thread Peter Geoghegan
On 22 March 2012 17:19, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm inclined to think that the most useful behavior is to teach the
 rewriter to copy queryId from the original query into all the Queries
 generated by rewrite.  Then, all rules fired by a source query would
 be lumped into that query for tracking purposes.  This might not be
 the ideal behavior either, but I don't see a better solution.

+1. This behaviour seems fairly sane. The lumping together of DO ALSO
and DO INSTEAD operations was a simple oversight.

 This seems to boil down to what you want to have happen with queries
 created/executed inside functions, which is something I don't recall
 being discussed.

Uh, well, pg_stat_statements is clearly supposed to monitor execution
of queries from within functions - there is a GUC,
pg_stat_statements.track, which can be set to 'all' to track nested
queries. That being the case, we should clearly be fingerprinting
those query trees too.

The fact that we'll fingerprint these queries even though we usually
don't care about them doesn't seem like a problem, since in practice
the vast majority will be prepared.

 Either way, I think we'd be a lot better advised to define a single
 hook post_parse_analysis_hook and make the core code responsible
 for calling it at the appropriate places, rather than supposing that
 the contrib module knows exactly which core functions ought to be
 the places to do it.

I agree.

Since you haven't mentioned the removal of parse-analysis Const
location alterations, I take it that you do not object to that, which
is something I'm glad of.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-22 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 Since you haven't mentioned the removal of parse-analysis Const
 location alterations, I take it that you do not object to that, which
 is something I'm glad of.

I remain un-thrilled about it, but apparently nobody else cares, so
I'll yield the point.  (I do however object to your removal of the
cast location value from the param_coerce_hook signature.  The fact
that one current user of the hook won't need it anymore doesn't mean
no others would.  Consider throwing a can't coerce error from within
the hook function, for instance.)

Will you adjust the patch for the other issues?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-22 Thread Peter Geoghegan
On 22 March 2012 19:07, Tom Lane t...@sss.pgh.pa.us wrote:
 Will you adjust the patch for the other issues?

Sure. I'll take a look at it now.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-20 Thread Bruce Momjian
On Mon, Mar 19, 2012 at 08:48:07PM +, Peter Geoghegan wrote:
 On 19 March 2012 19:55, Peter Eisentraut pete...@gmx.net wrote:
  If someone wanted to bite the bullet and do the work, I think we could
  move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly
  standard Test::* modules) and reduce that useless reformatting work and
  test more interesting things.  Just a thought ...
 
 I think that that is a good idea. However, I am not a Perl hacker,
 though this is the second time that that has left me at a disadvantage
 when working on Postgres, so I think it's probably time to learn a
 certain amount.

My blog entry on this topic might be helpful:

http://momjian.us/main/blogs/pgblog/2008.html#October_4_2008_2

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Peter Eisentraut
On mån, 2012-03-19 at 02:35 +, Peter Geoghegan wrote:
 I see your point of view. I suppose I can privately hold onto the test
 suite, since it might prove useful again.

I would still like to have those tests checked in, but not run by
default, in case someone wants to hack on this particular feature again.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Peter Eisentraut
On mån, 2012-03-19 at 08:59 +, Greg Stark wrote:
 The other problem with this approach is that it's hard to keep a huge
 test suite 100% clean. Changes inevitably introduce behaviour changes
 that cause some of the tests to fail.

I think we are used to that because of the way pg_regress works.  When
you have a better test infrastructure that tests actual functionality
rather than output formatting, this shouldn't be the case (nearly as
much).

If someone wanted to bite the bullet and do the work, I think we could
move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly
standard Test::* modules) and reduce that useless reformatting work and
test more interesting things.  Just a thought ...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Peter Geoghegan
On 19 March 2012 19:55, Peter Eisentraut pete...@gmx.net wrote:
 If someone wanted to bite the bullet and do the work, I think we could
 move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly
 standard Test::* modules) and reduce that useless reformatting work and
 test more interesting things.  Just a thought ...

I think that that is a good idea. However, I am not a Perl hacker,
though this is the second time that that has left me at a disadvantage
when working on Postgres, so I think it's probably time to learn a
certain amount.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Daniel Farina
On Sun, Mar 18, 2012 at 7:35 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 19 March 2012 01:50, Tom Lane t...@sss.pgh.pa.us wrote:
 I am *not* a fan of regression tests that try to microscopically test
 every feature in the system.

 I see your point of view. I suppose I can privately hold onto the test
 suite, since it might prove useful again.

 I will work on a pg_regress based approach with a reasonably-sized
 random subset of about 20 of my existing tests, to provide some basic
 smoke testing.

This may sound rather tortured, but in the main regression suite there
is a .c file that links some stuff into the backend that is then
accessed via CREATE FUNCTION to do some special fiddly bits.  Could a
creative hook be used here to avoid the repetition you are avoiding
via Python? (e.g. constant resetting of pg_stat_statements or
whatnot).  It might sound too much like changing the system under
test, but I think it would still retain most of the value.

I also do like the pg_regress workflow in general, although clearly it
cannot do absolutely everything.  Running and interpreting the results
of your tests was not hard, but it was definitely *different* which
could be a headache if one-off testing frameworks proliferate.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-18 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 Is there anything that I could be doing to help bring this patch
 closer to a committable state?

Sorry, I've not actually looked at that patch yet.  I felt I should
push on Andres' CTAS patch first, since that's blocking progress on
the command triggers patch.

 I'm thinking of the tests in particular
 - do you suppose it's acceptable to commit them more or less as-is?

If they rely on having python, that's a 100% guaranteed rejection
in my opinion.  It's difficult enough to sell people on incremental
additions of perl dependencies to the build/test process.  Bringing
in an entire new scripting language seems like a nonstarter.

I suppose we could commit such a thing as an appendage that doesn't
get run in standard builds, but then I see little point in it at all.
Tests that don't get run regularly are next door to useless.

Is there a really strong reason why adequate regression testing isn't
possible in a plain-vanilla pg_regress script?  A quick look at the
script says that it's just doing some SQL commands and then checking the
results of queries on the pg_stat_statements views.  Admittedly the
output would be bulkier in pg_regress, which would mean that we'd not
likely want several hundred test cases.  But IMO the objective of a
regression test is not to memorialize every single case the code author
thought about during development.  ISTM it would not take very many
cases to have reasonable code coverage.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-18 Thread Peter Geoghegan
On 18 March 2012 16:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Is there a really strong reason why adequate regression testing isn't
 possible in a plain-vanilla pg_regress script?  A quick look at the
 script says that it's just doing some SQL commands and then checking the
 results of queries on the pg_stat_statements views.  Admittedly the
 output would be bulkier in pg_regress, which would mean that we'd not
 likely want several hundred test cases.  But IMO the objective of a
 regression test is not to memorialize every single case the code author
 thought about during development.  ISTM it would not take very many
 cases to have reasonable code coverage.

Hmm. It's difficult to have much confidence that a greatly reduced
number of test cases ought to provide sufficient coverage. I don't
disagree with your contention, I just don't know how to judge this
matter. Given that there isn't really a maintenance burden with
regression tests, I imagine that that makes it compelling to be much
more inclusive.

The fact that we rely on there being no concurrent queries might have
to be worked around for parallel scheduled regression tests, such as
by doing everything using a separate database, with that database oid
always in the predicate of the query checking the pg_stat_statements
view.

I probably would have written the tests in Perl in the first place,
but I don't know Perl. These tests existed in some form from day 1, as
I followed a test-driven development methodology, and needed to use a
language that I could be productive in immediately. There is probably
no reason why they cannot be re-written in Perl, but spit out
pg_regress tests, compacting the otherwise-verbose pg_regress input.
Should I cut my teeth on Perl by writing the tests to do so? How might
this be integrated with the standard regression tests, if that's
something that is important?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-18 Thread Andrew Dunstan



On 03/18/2012 06:12 PM, Peter Geoghegan wrote:

On 18 March 2012 16:13, Tom Lanet...@sss.pgh.pa.us  wrote:

Is there a really strong reason why adequate regression testing isn't
possible in a plain-vanilla pg_regress script?  A quick look at the
script says that it's just doing some SQL commands and then checking the
results of queries on the pg_stat_statements views.  Admittedly the
output would be bulkier in pg_regress, which would mean that we'd not
likely want several hundred test cases.  But IMO the objective of a
regression test is not to memorialize every single case the code author
thought about during development.  ISTM it would not take very many
cases to have reasonable code coverage.

Hmm. It's difficult to have much confidence that a greatly reduced
number of test cases ought to provide sufficient coverage. I don't
disagree with your contention, I just don't know how to judge this
matter. Given that there isn't really a maintenance burden with
regression tests, I imagine that that makes it compelling to be much
more inclusive.

The fact that we rely on there being no concurrent queries might have
to be worked around for parallel scheduled regression tests, such as
by doing everything using a separate database, with that database oid
always in the predicate of the query checking the pg_stat_statements
view.

I probably would have written the tests in Perl in the first place,
but I don't know Perl. These tests existed in some form from day 1, as
I followed a test-driven development methodology, and needed to use a
language that I could be productive in immediately. There is probably
no reason why they cannot be re-written in Perl, but spit out
pg_regress tests, compacting the otherwise-verbose pg_regress input.
Should I cut my teeth on Perl by writing the tests to do so? How might
this be integrated with the standard regression tests, if that's
something that is important?


A pg_regress script doesn't require any perl. It's pure SQL.

It is perfectly possible to make a single test its own group in a 
parallel schedule, and this is done now for a number of cases. See 
src/test/regress/parallel_schedule. Regression tests run in their own 
database set up for the purpose. You should be able to restrict the 
regression queries to only the current database.


If you want to generate the tests using some tool, then use whatever 
works for you, be it Python or Perl or Valgol, but ideally what is 
committed (and this what should be in your patch) will be the SQL output 
of that, not the generator plus input. Tests built that way get 
automatically run by the buildfarm. Tests that don't use the standard 
testing framework don't. You need a *really* good reason, therefore, not 
to do it that way.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-18 Thread Peter Geoghegan
On 18 March 2012 22:46, Andrew Dunstan and...@dunslane.net wrote:
 If you want to generate the tests using some tool, then use whatever works
 for you, be it Python or Perl or Valgol, but ideally what is committed (and
 this what should be in your patch) will be the SQL output of that, not the
 generator plus input.

The reason that I'd prefer to use Perl or even Python to generate
pg_regress input, and then have that infrastructure committed is
because it's a lot more natural and succint to deal with the problem
that way. I would have imagined that a patch that repeats the same
boilerplate again and again, to test almost every minor facet of
normalisation would be frowned upon. However, if you prefer that, it
can easily be accommodated.

The best approach might be to commit the output of the Python script
as well as the python script itself, with some clean-up work. That
way, no one is actually required to run the Python script themselves
as part of a standard build, and so they have no basis to complain
about additional dependencies. We can run the regression tests from
the buildfarm without any additional infrastructure to invoke the
python script to generate the pg_regress tests each time. When time
comes to change the representation of the query tree, which is not
going to be that frequent an event, but will occur every once in a
while, the author of the relevant patch should think to add some tests
to my existing set, and verify that they pass. That's going to be made
a lot easier by having them edit a file that expresses the problem in
terms whether two queries should be equivalent or distinct, or what a
given query's final canonicalised representation should look like, all
with minimal boilerplate. I'm only concerned with making the patch as
easy as possible to maintain.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-18 Thread Andrew Dunstan



On 03/18/2012 07:46 PM, Peter Geoghegan wrote:

On 18 March 2012 22:46, Andrew Dunstanand...@dunslane.net  wrote:

If you want to generate the tests using some tool, then use whatever works
for you, be it Python or Perl or Valgol, but ideally what is committed (and
this what should be in your patch) will be the SQL output of that, not the
generator plus input.

The reason that I'd prefer to use Perl or even Python to generate
pg_regress input, and then have that infrastructure committed is
because it's a lot more natural and succint to deal with the problem
that way. I would have imagined that a patch that repeats the same
boilerplate again and again, to test almost every minor facet of
normalisation would be frowned upon. However, if you prefer that, it
can easily be accommodated.



If your tests are that voluminous then maybe they are not what we're 
looking for anyway. As Tom noted:


   IMO the objective of a regression test is not to memorialize every single 
case the code author thought about during development.  ISTM it would not take 
very many cases to have reasonable code coverage.


Why exactly does this feature need particularly to have script-driven 
regression test generation when others don't?


If this is a general pattern that people want to follow, then maybe we 
need to plan and support it rather than just add a random test 
generation script here and there.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-18 Thread Peter Geoghegan
On 19 March 2012 00:10, Andrew Dunstan and...@dunslane.net wrote:
 If your tests are that voluminous then maybe they are not what we're looking
 for anyway. As Tom noted:


   IMO the objective of a regression test is not to memorialize every single
 case the code author thought about during development.  ISTM it would not
 take very many cases to have reasonable code coverage.

Fair enough.

 Why exactly does this feature need particularly to have script-driven
 regression test generation when others don't?

It's not that it needs it, so much as that it is possible to provide
coverage for much of the code with black-box testing. In the case of
most of the hundreds of tests, I can point to a particular piece of
code that is being tested, that was written *after* the test was.
Doing this with pg_regress the old-fashioned way is going to be
incredibly verbose. I'm all for doing script-generation of pg_regress
tests in a well-principled way, and I'm happy to take direction from
others as to what that should look like.

I know that for the most part the tests provide coverage for discrete
units of functionality, and so add value. If they add value, why not
include them? Tests are supposed to be comprehensive. If that
inconveniences you, by slowing down the buildfarm for questionable
benefits, maybe it would be okay to have some tests not run
automatically, even if that did make them next door to useless in
Tom's estimation. There could be a more limited set of conventional
pg_regress tests that are run automatically, plus more comprehensive
tests that are run less frequently, typically only as it becomes
necessary to alter pg_stat_statements to take account of those
infrequent changes (typically additions) to the query tree.

We have tests that ensure that header files don't contain C++
keywords, and nominally promise to not do so, and they are not run
automatically. I don't see the sense in requiring that tests should be
easy to run, while also aspiring to have tests that are as useful and
comprehensive as possible. It seems like the code should dictate the
testing infrastructure, and not the other way around.

Part of the reason why I'm resistant to reducing the number of tests
is that it seems to me that excluding some tests but not others would
be quite arbitrary. It is not the case that some tests are clearly
more useful than others (except for the fuzz testing stuff, which
probably isn't all that useful).

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-18 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 19 March 2012 00:10, Andrew Dunstan and...@dunslane.net wrote:
 Why exactly does this feature need particularly to have script-driven
 regression test generation when others don't?

 It's not that it needs it, so much as that it is possible to provide
 coverage for much of the code with black-box testing. In the case of
 most of the hundreds of tests, I can point to a particular piece of
 code that is being tested, that was written *after* the test was.

Well, basically what you're saying is that you did test-driven
development, which is fine.  However, that does not mean that those
same tests are ideal for ongoing regression testing.  What we want from
a regression test these days is primarily (a) portability testing, ie
does the feature work on platforms other than yours?, and (b) early
warning if someone breaks it down the road.  In most cases, fairly
coarse testing is enough to catch drive-by breakage; and when it's not
enough, like as not the breakage is due to something you never thought
about originally and thus never tested for, so you'd not have caught it
anyway.

I am *not* a fan of regression tests that try to microscopically test
every feature in the system.  Sure you should do that when initially
developing a feature, but it serves little purpose to do it over again
every time any other developer runs the regression tests for the
foreseeable future.  That road leads to a regression suite that's so
voluminous that it takes too long to run and developers start to avoid
running it, which is counterproductive.  For an example in our own
problem space look at mysql, whose regression tests take well over an
hour to run on a fast box.  So they must be damn near bug-free right?
Uh, not so much, and I think the fact that developers can't easily run
their test suite is not unrelated to that.

So what I'd rather see is a small set of tests that are designed to do a
smoke-test of functionality and then exercise any interfaces to the rest
of the system that seem likely to break.  Over time we might augment
that, when we find particular soft spots as a result of previously
undetected bugs.  But sheer volume of tests is not a positive IMO.

As for the scripted vs raw-SQL-in-pg_regress question, I'm making the
same point as Andrew: only the pg_regress method is likely to get run
nearly everywhere, which means that the scripted approach is a FAIL
so far as the portability-testing aspect is concerned.

Lastly, even given that we were willing to accept a scripted set of
tests, I'd want to see it in perl not python.  Perl is the project
standard; I see no reason to expect developers to learn two different
scripting languages to work on PG.  (There might be a case for
accepting python-scripted infrastructure for pl/python, say, but not
for components that are 100% unrelated to python.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-18 Thread Peter Geoghegan
On 19 March 2012 01:50, Tom Lane t...@sss.pgh.pa.us wrote:
 I am *not* a fan of regression tests that try to microscopically test
 every feature in the system.

I see your point of view. I suppose I can privately hold onto the test
suite, since it might prove useful again.

I will work on a pg_regress based approach with a reasonably-sized
random subset of about 20 of my existing tests, to provide some basic
smoke testing.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-17 Thread Peter Geoghegan
Is there anything that I could be doing to help bring this patch
closer to a committable state? I'm thinking of the tests in particular
- do you suppose it's acceptable to commit them more or less as-is?

The standard for testing contrib modules seems to be a bit different,
as there is a number of other cases where an impedance mistmatch with
pg_regress necessitates doing things differently. So, the sepgsql
tests, which I understand are mainly to test the environment that the
module is being built for rather than the code itself, are written as
a shellscript than uses various selinux tools. There is also a Perl
script that uses DBD::Pg to benchmark intarray, for example.

Now that we have a defacto standard python driver, something that we
didn't have a couple of years ago, it probably isn't terribly
unreasonable to keep the tests in Python. They'll still probably need
some level of clean-up, to cut back on some of the tests that are
redundant. Some of the tests are merely fuzz tests, which are perhaps
a bit questionable.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-13 Thread Peter Geoghegan
On 2 March 2012 20:10, Tom Lane t...@sss.pgh.pa.us wrote:
 I do intend to take this one up in due course

I probably should have exposed the query_id directly in the
pg_stat_statements view, perhaps as query_hash. The idea of that
would be to advertise the potential non-uniqueness of the value - a
collision is *extremely* unlikely (as I've previously calculated), but
we cannot preclude the possibility, and as such it isn't *really*
usable as a primary key. BTW, even if there is a collision, we at
least know that there can't be a situation where one user's query
entry gets spurious statistics from the execution of some other
user's, or one database gets statistics from another, since their
corresponding oid values separately form part of the dynahash key,
alongside query_id.

The other reason why I'd like to do this is that I'd like to build on
this work for 9.3, and add a new column - plan_hash. When a new mode,
pg_stat_statements.plan_hash (or somesuch) is disabled (as it is by
default), this is always null, and we get the same 9.2 behaviour. When
it is enabled, however, all existing entries are invalidated, for a
clean slate. We then start hashing both the query tree *and* the query
plan. It's a whole lot less useful if we only hash the latter. Now,
entries within the view use the plan_hash as their key (or maybe a
composite of query_hash and plan_hash). This often results in entries
with duplicate query_hash values, as the planner generates different
plans for equivalent queries, but that doesn't matter; you can easily
write an aggregate query with a GROUP BY query_hash clause if that's
what you happen to want to see.

When this optional mode is enabled, at that point we'd probably also
separately instrument planning time, as recently proposed by Fujii.

Does that seem like an interesting idea?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-13 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 I probably should have exposed the query_id directly in the
 pg_stat_statements view, perhaps as query_hash.

FWIW, I think that's a pretty bad idea; the hash seems to me to be
strictly an internal matter.  Given the sponginess of its definition
I don't really want it exposed to users.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-05 Thread Simon Riggs
On Sat, Mar 3, 2012 at 12:01 AM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Mar 2, 2012 at 4:56 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Checksums patch isn't sucking much attention at all but admittedly
 there are some people opposed to the patch that want to draw out the
 conversation until the patch is rejected,

 Wow.  Sounds like a really shitty thing for those people to do -
 torpedoing a perfectly good patch for no reason.

You've explained to me how you think I do that elsewhere and how that
annoyed you, so I think that topic deserves discussion at the
developers meeting to help us understand one another rather than
perpetuate this.


 I have an alternative theory, though: they have sincere objections and
 don't accept your reasons for discounting those objections.


That's exactly the problem though and the discussion on it is relevant here.

Nobody thinks objections on this patch, checksums or others are made
insincerely. It's what happens next that matters. The question should
be about acceptance criteria. What do we need to do to get something
useful committed? Without a clear set of criteria for resolution we
cannot move forward swiftly enough to do useful things. My thoughts
are always about salvaging what we can, trying to find a way through
the maze of objections and constraints not just black/white decisions
based upon the existence of an objection, as if that single point
trumps any other consideration and blocks all possibilities.

So there is a clear difference between an objection to any progress on
a topic (I sincerely object to the checksum patch), and a technical
objection to taking a particular course of action (We shouldn't use
bits x1..x3 because). The first is not viable, however sincerely
it is made, because it leaves the author with no way of resolving
things and it also presumes that the patch only exists in one version
and that the author is somehow refusing to make agreed changes.
Discussion started *here* because it was said Person X is trying to
force patch Y thru, which is true - but that doesn't necessarily mean
the version of the patch that current objections apply to, only that
the author has an equally sincere wish to do something useful.

The way forwards here and elsewhere is to list out the things we can't
do and list out the things that must change - a clear list of
acceptance criteria. If we do that as early as possible we give the
author a good shot at being able to make those changes in time to
commit something useful. Again, only *something* useful: the full
original vision is not always possible.

In summary: What can be done in this release, given the constraints discussed?

So for Peter's patch - what do we need to do to allow some/all of this
to be committed?

And for the checksum patch please go back to the checksum thread and
list out all the things you consider unresolved. In some cases,
resolutions have been suggested but not yet implemented so it would
help if those are either discounted now before they are written, or
accepted in principle to allow work to proceed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-05 Thread Andrew Dunstan



On 03/05/2012 05:12 AM, Simon Riggs wrote:

On Sat, Mar 3, 2012 at 12:01 AM, Robert Haasrobertmh...@gmail.com  wrote:


On Fri, Mar 2, 2012 at 4:56 PM, Simon Riggssi...@2ndquadrant.com  wrote:

Checksums patch isn't sucking much attention at all but admittedly
there are some people opposed to the patch that want to draw out the
conversation until the patch is rejected,

Wow.  Sounds like a really shitty thing for those people to do -
torpedoing a perfectly good patch for no reason.

You've explained to me how you think I do that elsewhere and how that
annoyed you, so I think that topic deserves discussion at the
developers meeting to help us understand one another rather than
perpetuate this.




No matter how much we occasionally annoy each other, I think we all need 
to accept that we're all dealing in good faith. Suggestions to the 
contrary are ugly, have no foundation in fact that I'm aware of, and 
reflect badly on our community.


Postgres has a well deserved reputation for not having the sort of 
public bickering that has caused people to avoid certain other projects. 
Please keep it that way.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-02 Thread Josh Berkus

 It would probably be prudent to concentrate on getting the core
 infrastructure committed first. That way, we at least know that if
 this doesn't get into 9.2, we can work on getting it into 9.3 knowing
 that once committed, people won't have to wait over a year at the very

I don't see why we can't commit the whole thing.  This is way more ready
for prime-time than checksums.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-02 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 It would probably be prudent to concentrate on getting the core
 infrastructure committed first. That way, we at least know that if
 this doesn't get into 9.2, we can work on getting it into 9.3 knowing
 that once committed, people won't have to wait over a year at the very

 I don't see why we can't commit the whole thing.  This is way more ready
 for prime-time than checksums.

We'll get to it in due time.  In case you haven't noticed, there's a lot
of stuff in this commitfest.  And I don't follow the logic that says
that because Simon is trying to push through a not-ready-for-commit
patch we should drop our standards for other patches.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-02 Thread Robert Haas
On Fri, Mar 2, 2012 at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 It would probably be prudent to concentrate on getting the core
 infrastructure committed first. That way, we at least know that if
 this doesn't get into 9.2, we can work on getting it into 9.3 knowing
 that once committed, people won't have to wait over a year at the very

 I don't see why we can't commit the whole thing.  This is way more ready
 for prime-time than checksums.

 We'll get to it in due time.  In case you haven't noticed, there's a lot
 of stuff in this commitfest.  And I don't follow the logic that says
 that because Simon is trying to push through a not-ready-for-commit
 patch we should drop our standards for other patches.

I don't follow that logic either, but I also feel like this CommitFest
is dragging on and on.  Unless you -- or someone -- are prepared to
devote a lot more time to this, due time is not going to arrive any
time in the forseeable future.  We're currently making progress at a
rate of maybe 4 patches a week, at which rate we're going to finish
this CommitFest in May.  And that might be generous, because we've
been disproportionately knocking off the easy ones.  Do we have any
kind of a plan for, I don't know, bringing this to closure on some
reasonable time frame?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-02 Thread Simon Riggs
On Fri, Mar 2, 2012 at 5:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 It would probably be prudent to concentrate on getting the core
 infrastructure committed first. That way, we at least know that if
 this doesn't get into 9.2, we can work on getting it into 9.3 knowing
 that once committed, people won't have to wait over a year at the very

 I don't see why we can't commit the whole thing.  This is way more ready
 for prime-time than checksums.

 We'll get to it in due time.  In case you haven't noticed, there's a lot
 of stuff in this commitfest.  And I don't follow the logic that says
 that because Simon is trying to push through a not-ready-for-commit
 patch we should drop our standards for other patches.

Hmm, not deaf you know. I would never push through a patch that isn't
ready for commit. If I back something it is because it is ready for
use in production by PostgreSQL users, in my opinion. I get burned
just as much, if not more, than others if that's a bad decision, so
its not given lightly.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-02 Thread Josh Berkus

 We'll get to it in due time.  In case you haven't noticed, there's a lot
 of stuff in this commitfest.  And I don't follow the logic that says
 that because Simon is trying to push through a not-ready-for-commit
 patch we should drop our standards for other patches.

What I'm pointing out is that Peter shouldn't even be talking about
cutting functionality from an apparently-ready-for-committer patch in
order to yield way to a patch about which people are still arguing
specification.

This is exactly why I'm not keen on checksums for 9.2.  We've reached
the point where the attention on the checksum patch is pushing aside
other patches which are more ready and have had more work.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Mar 2, 2012 at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We'll get to it in due time.  In case you haven't noticed, there's a lot
 of stuff in this commitfest.  And I don't follow the logic that says
 that because Simon is trying to push through a not-ready-for-commit
 patch we should drop our standards for other patches.

 I don't follow that logic either, but I also feel like this CommitFest
 is dragging on and on.  Unless you -- or someone -- are prepared to
 devote a lot more time to this, due time is not going to arrive any
 time in the forseeable future.  We're currently making progress at a
 rate of maybe 4 patches a week, at which rate we're going to finish
 this CommitFest in May.  And that might be generous, because we've
 been disproportionately knocking off the easy ones.  Do we have any
 kind of a plan for, I don't know, bringing this to closure on some
 reasonable time frame?

Well, personally I was paying approximately zero attention to the
commitfest for most of February, because I was occupied with trying to
get back-branch releases out, as well as some non-Postgres matters.
CF items are now back to the head of my to-do queue; you may have
noticed that I'm busy with Korotkov's array stats patch.  I do intend to
take this one up in due course (although considering it's not marked
Ready For Committer yet, I don't see that it deserves time ahead of
those that are).

As for when we'll be done with the CF, I dunno, but since it's the last
one for this release cycle I didn't think that we'd be arbitrarily
closing it on any particular schedule.  It'll be done when it's done.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-02 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 This is exactly why I'm not keen on checksums for 9.2.  We've reached
 the point where the attention on the checksum patch is pushing aside
 other patches which are more ready and have had more work.

IMO the reason why it's sucking so much attention is precisely that it's
not close to being ready to commit.  But this is well off topic for the
thread we're on.  If you want to propose booting checksums from
consideration for 9.2, let's have that discussion on the checksum
thread.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-02 Thread Simon Riggs
On Fri, Mar 2, 2012 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 This is exactly why I'm not keen on checksums for 9.2.  We've reached
 the point where the attention on the checksum patch is pushing aside
 other patches which are more ready and have had more work.

 IMO the reason why it's sucking so much attention is precisely that it's
 not close to being ready to commit.  But this is well off topic for the
 thread we're on.  If you want to propose booting checksums from
 consideration for 9.2, let's have that discussion on the checksum
 thread.

Checksums patch isn't sucking much attention at all but admittedly
there are some people opposed to the patch that want to draw out the
conversation until the patch is rejected, but that's not the same
thing. The main elements of the patch have been working for around 7
weeks by now.

I'm not sure how this topic is even raised here, since the patches are
wholly and completely separate, apart from the minor and irrelevant
point that the patch authors both work for 2ndQuadrant. If that
matters at all, I'll be asking how and why.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-02 Thread Robert Haas
On Fri, Mar 2, 2012 at 4:56 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Checksums patch isn't sucking much attention at all but admittedly
 there are some people opposed to the patch that want to draw out the
 conversation until the patch is rejected,

Wow.  Sounds like a really shitty thing for those people to do -
torpedoing a perfectly good patch for no reason.

I have an alternative theory, though: they have sincere objections and
don't accept your reasons for discounting those objections.

 I'm not sure how this topic is even raised here, since the patches are
 wholly and completely separate, apart from the minor and irrelevant
 point that the patch authors both work for 2ndQuadrant. If that
 matters at all, I'll be asking how and why.

It came up because Josh pointed out that this patch is, in his
opinion, in better shape than the checksum patch.  I don't believe
anyone's employment situation comes into it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Peter Geoghegan
On 1 March 2012 00:48, Alvaro Herrera alvhe...@commandprompt.com wrote:
 I'm curious about the LeafNode stuff.  Is this something that could be
 done by expression_tree_walker?  I'm not completely familiar with it so
 maybe there's some showstopper such as some node tags not being
 supported, or maybe it just doesn't help.  But I'm curious.

Good question. The LeafNode function (which is a bit of a misnomer, as
noted in a comment) looks rather like a walker function, as appears in
the example in a comment in nodeFuncs.c:

* expression_tree_walker() is designed to support routines that traverse
 * a tree in a read-only fashion (although it will also work for routines
 * that modify nodes in-place but never add/delete/replace nodes).
 * A walker routine should look like this:
 *
 * bool my_walker (Node *node, my_struct *context)
 * {
 *  if (node == NULL)
 *  return false;
 *  // check for nodes that special work is required for, eg:
 *  if (IsA(node, Var))
 *  {
 *  ... do special actions for Var nodes
 *  }
 *  else if (IsA(node, ...))
 *  {
 *  ... do special actions for other node types
 *  }
 *  // for any node type not specially processed, do:
 *  return expression_tree_walker(node, my_walker, (void *) 
context);
 * }

My understanding is that the expression-tree walking support is mostly
useful for the majority of walker code, which only cares about a small
subset of nodes, and hopes to avoid including boilerplate code just to
walk those other nodes that it's actually disinterested in.

This code, unlike most clients of expression_tree_walker(), is pretty
much interested in everything, since its express purpose is to
fingerprint all possible query trees.

Another benefit of expression_tree_walker is that if you miss a
certain node being added, (say a FuncExpr-like node), you get to
automatically have that node walked over to walk to the nodes that you
do in fact care about (such as those within this new nodes args List).
That makes perfect sense in the majority of cases, but here you've
already missed the fields within this new node that FuncExpr itself
lacks, so you're already finger-printing inaccurately. I suppose you
could still at least get the nodetag and still have a warning about
the fingerprinting being inadequate by going down the
expression_tree_walker path, but I'm inclined to wonder if it you
aren't just better of directly walking the tree, if only to encourage
the idea that this code needs to be maintained over time, and to cut
down on the little extra bit of indirection that that imposes.

It's not going to be any sort of burden to maintain it - it currently
stands at a relatively meagre 800 lines of code for everything to do
with tree walking - and the code that will have to be added with new
nodes or refactored along with the existing tree structure is going to
be totally trivial.

All of that said, I wouldn't mind making LeafNode into a walker, if
that approach is judged to be better, and you don't mind documenting
the order in which the tree is walked as deterministic, because the
order now matters in a way apparently not really anticipated by
expression_tree_walker, though that's probably not a problem.

My real concern now is that it's March 1st, and the last commitfest
may end soon. Even though this patch has extensive regression tests,
has been floating around for months, and, I believe, will end up being
a timely and important feature, a committer has yet to step forward to
work towards this patch getting committed. Can someone volunteer,
please? My expectation is that this feature will make life a lot
easier for a lot of Postgres users.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Daniel Farina
On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 My expectation is that this feature will make life a lot
 easier for a lot of Postgres users.

Yes.  It's hard to overstate the apparent utility of this feature in
the general category of visibility and profiling.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Josh Berkus
On 3/1/12 1:57 PM, Daniel Farina wrote:
 On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 My expectation is that this feature will make life a lot
 easier for a lot of Postgres users.
 
 Yes.  It's hard to overstate the apparent utility of this feature in
 the general category of visibility and profiling.

More importantly, this is what pg_stat_statements *should* have been in
8.4, and wasn't.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-01 Thread Peter Geoghegan
On 1 March 2012 22:09, Josh Berkus j...@agliodbs.com wrote:
 On 3/1/12 1:57 PM, Daniel Farina wrote:
 On Thu, Mar 1, 2012 at 1:27 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 My expectation is that this feature will make life a lot
 easier for a lot of Postgres users.

 Yes.  It's hard to overstate the apparent utility of this feature in
 the general category of visibility and profiling.

 More importantly, this is what pg_stat_statements *should* have been in
 8.4, and wasn't.

It would probably be prudent to concentrate on getting the core
infrastructure committed first. That way, we at least know that if
this doesn't get into 9.2, we can work on getting it into 9.3 knowing
that once committed, people won't have to wait over a year at the very
least to be able to use the feature. The footprint of such a change is
quite small:

 src/backend/nodes/copyfuncs.c  |2 +
 src/backend/nodes/equalfuncs.c |4 +
 src/backend/nodes/outfuncs.c   |6 +
 src/backend/nodes/readfuncs.c  |5 +
 src/backend/optimizer/plan/planner.c   |1 +
 src/backend/parser/analyze.c   |   37 +-
 src/backend/parser/parse_coerce.c  |   12 +-
 src/backend/parser/parse_param.c   |   11 +-
 src/include/nodes/parsenodes.h |3 +
 src/include/nodes/plannodes.h  |2 +
 src/include/parser/analyze.h   |   12 +
 src/include/parser/parse_node.h|3 +-

That said, I believe that the patch is pretty close to a commitable
state as things stand, and that all that is really needed is for a
committer familiar with the problem space to conclude the work started
by Daniel and others, adding:

contrib/pg_stat_statements/pg_stat_statements.c| 1420 ++-

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-29 Thread Daniel Farina
On Mon, Feb 27, 2012 at 4:26 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 This does not appear to have any user-visible effect on caret position
 for all variations in coercion syntax, while giving me everything that
 I need. I had assumed that we were relying on things being this way,
 but apparently this is not the case. The system is correctly blaming
 the coercion token when it finds the coercion is at fault, and the
 const token when it finds the Const node at fault, just as it did
 before. So this looks like a case of removing what amounts to dead
 code.

To shed some light on that hypothesis, attached is a patch whereby I
use 'semantic analysis by compiler error' to show the extent of the
reach of the changes by renaming (codebase-wide) the Const node's
location symbol.  The scope whereby the error token will change
position is small and amenable to analysis.  I don't see a problem,
nor wide-reaching consequences.  As Peter says: probably dead code.
Note that the cancellation of the error position happens very soon,
after an invocation of stringTypeDatum (on two sides of a branch).
Pre and post-patch is puts the carat at the beginning of the constant
string, even in event there is a failure to parse it properly to the
destined type.

-- 
fdr


Straw-man-to-show-the-effects-of-the-change-to-const.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-29 Thread Alvaro Herrera


I'm curious about the LeafNode stuff.  Is this something that could be
done by expression_tree_walker?  I'm not completely familiar with it so
maybe there's some showstopper such as some node tags not being
supported, or maybe it just doesn't help.  But I'm curious.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-27 Thread Peter Geoghegan
On 27 February 2012 06:23, Tom Lane t...@sss.pgh.pa.us wrote:
 I think that what Peter is on about in
 http://archives.postgresql.org/pgsql-hackers/2012-02/msg01152.php
 is the question of what location to use for the *result* of
 'literal string'::typename, assuming that the type's input function
 doesn't complain.  Generally we consider that we should use the
 leftmost token's location for the location of any expression composed
 of more than one input token.  This is of course the same place for
 'literal string'::typename, but not for the alternate syntaxes
 typename 'literal string' and cast('literal string' as typename).
 I'm not terribly impressed by the proposal to put in an arbitrary
 exception to that general rule for the convenience of this patch.

Now, you don't have to be. It was a mistake on my part to bring the
current user-visible behaviour into this. I don't see that there is
necessarily a tension between your position that we should blame the
leftmost token's location, and my contention that the Const location
field shouldn't misrepresent the location of certain Consts involved
in coercion post-analysis.

Let me put that in concrete terms. In my working copy of the patch, I
have made some more changes to the core system (mostly reverting
things that turned out to be unnecessary), but I have also made the
following change:

*** a/src/backend/parser/parse_coerce.c
--- b/src/backend/parser/parse_coerce.c
*** coerce_type(ParseState *pstate, Node *no
*** 280,293 
newcon-constlen = typeLen(targetType);
newcon-constbyval = typeByVal(targetType);
newcon-constisnull = con-constisnull;
!   /* Use the leftmost of the constant's and coercion's locations 
*/
!   if (location  0)
!   newcon-location = con-location;
!   else if (con-location = 0  con-location  location)
!   newcon-location = con-location;
!   else
!   newcon-location = location;
!
/*
 * Set up to point at the constant's text if the input routine 
throws
 * an error.
--- 280,286 
newcon-constlen = typeLen(targetType);
newcon-constbyval = typeByVal(targetType);
newcon-constisnull = con-constisnull;
!   newcon-location = con-location;
/*
 * Set up to point at the constant's text if the input routine 
throws
 * an error.
*

This does not appear to have any user-visible effect on caret position
for all variations in coercion syntax, while giving me everything that
I need. I had assumed that we were relying on things being this way,
but apparently this is not the case. The system is correctly blaming
the coercion token when it finds the coercion is at fault, and the
const token when it finds the Const node at fault, just as it did
before. So this looks like a case of removing what amounts to dead
code.

 Especially not when the only reason it's needed is that Peter is
 doing the fingerprinting at what is IMO the wrong place anyway.
 If he were working on the raw grammar output it wouldn't matter
 what parse_coerce chooses to do afterwards.

Well, I believe that your reason for preferring to do it at that stage
was that we could not capture all of the system's normalisation
smarts, like the fact that the omission of noise words isn't a
differentiator, so we might as well not have any. This was because
much of it - like the recognition of the equivalence of explicit joins
and queries with join conditions in the where clause - occurs within
the planner. We can't have it all, so we might as well not have any.
My solution here is that we be sufficiently vague about the behaviour
of normalisation that the user has no reasonable basis to count on
that kind of more advanced reduction occurring.

I did very seriously consider hashing the raw parse tree, but I have
several practical reasons for not doing so. Whatever way you look at
it, hashing there is going to result in more code, that is more ugly.
There is no uniform parent node that I can tag with a query_id. There
has to be more modifications to the core system so that queryId value
is carried around more places and persists for longer. The fact that
I'd actually be hashing different structs at different times (that
tree is accessed through a Node pointer) would necessitate lots of
redundant code that operated on each of the very similar structs in an
analogous way. The fact is that waiting until after parse analysis has
plenty of things to recommend it, and yes, the fact that we already
have working code with extensive regression tests is one of them.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-26 Thread Robert Haas
On Fri, Feb 24, 2012 at 9:43 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Tom's point example does not seem to be problematic to me - the cast
 *should* blame the 42 const token, as the cast doesn't work as a
 result of its representation, which is in point of fact why the core
 system blames the Const node and not the coercion one.

I think I agree Tom's position upthread: blaming the coercion seems to
me to make more sense.  But if that's what we're trying to do, then
why does parse_coerce() say this?

/*
 * Set up to point at the constant's text if the input routine throws
 * an error.
 */

/me is confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I think I agree Tom's position upthread: blaming the coercion seems to
 me to make more sense.  But if that's what we're trying to do, then
 why does parse_coerce() say this?

 /*
  * Set up to point at the constant's text if the input routine throws
  * an error.
  */

 /me is confused.

There are two cases that are fundamentally different in the eyes of the
system:

'literal string'::typename defines a constant of the named type.
The string is fed to the type's input routine de novo, that is, it never
really had any other type.  (Under the hood, it had type UNKNOWN for a
short time, but that's an implementation detail.)  In this situation it
seems appropriate to point at the text string if the input routine
doesn't like it, because it is the input string and nothing else that is
wrong.

On the other hand, when you cast something that already had a known type
to some other type, any failure seems reasonable to blame on the cast
operator.

So in these terms there's a very real difference between what
'42'::bigint means and what 42::bigint means --- the latter implies
forming an int4 constant and then converting it to int8.

I think that what Peter is on about in
http://archives.postgresql.org/pgsql-hackers/2012-02/msg01152.php
is the question of what location to use for the *result* of
'literal string'::typename, assuming that the type's input function
doesn't complain.  Generally we consider that we should use the
leftmost token's location for the location of any expression composed
of more than one input token.  This is of course the same place for
'literal string'::typename, but not for the alternate syntaxes
typename 'literal string' and cast('literal string' as typename).
I'm not terribly impressed by the proposal to put in an arbitrary
exception to that general rule for the convenience of this patch.

Especially not when the only reason it's needed is that Peter is
doing the fingerprinting at what is IMO the wrong place anyway.
If he were working on the raw grammar output it wouldn't matter
what parse_coerce chooses to do afterwards.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-24 Thread Peter Geoghegan
On 21 February 2012 01:48, Tom Lane t...@sss.pgh.pa.us wrote:
 you're proposing to move the error pointer to the 42, and that does
 not seem like an improvement, especially not if it only happens when the
 cast subject is a simple constant rather than an expression.

2008's commit a2794623d292f7bbfe3134d1407281055acce584 [1] added the
following code to parse_coerce.c [2]:

/* Use the leftmost of the constant's and coercion's locations */
if (location  0)
newcon-location = con-location;
else if (con-location = 0  con-location  location)
newcon-location = con-location;
else
newcon-location = location;

With that commit, Tom made a special case of both Const and Param
nodes, and had them take the leftmost location of the original Const
location and the coercion location. Clearly, he judged that the
current exact set of behaviours with regard to caret position were
optimal. It is my contention that:

A. They may not actually be optimal, at least not according to my
taste. At the very least, it is a hack to misrepresent the location of
Const nodes just so the core system can blame things on Const nodes
and have the user see the coercion being at fault. I appreciate that
it wouldn't have seemed to matter at the time, but the fact remains.

B. The question of where the caret goes in relevant cases - the
location of the coercion, or the location of the constant - is
inconsequential to the vast majority of Postgres users, if not all,
even if the existing behaviour is technically superior according to
the prevailing aesthetic.  On the other hand, it matters a lot to me
that I be able to trust the Const location under all circumstances -
I'd really like to not have to engineer a way around this behaviour,
because the only way to do that is with tricks with the low-level
scanner API, which would be quite brittle. The fact that select
integer '5' is canonicalised to select ? isn't very pretty. That's
not the only issue though, as even to get that more limited behaviour
lots more code is required, that is more difficult to verify as
correct. Canonicalise one token at each Const location is a simple
and robust approach, if only the core system could be tweaked to make
this assumption hold in all circumstances, rather than just the vast
majority.

Tom's point example does not seem to be problematic to me - the cast
*should* blame the 42 const token, as the cast doesn't work as a
result of its representation, which is in point of fact why the core
system blames the Const node and not the coercion one. For that
reason, the constant vs expression thing strikes me as  false
equivalency. All of that said, I must reiterate that the difference in
behaviour strike me as very unimportant, or it would if it was not so
important to what I'm trying to do with pg_stat_statements.

Can this be accommodated? It might be a matter of changing the core
system to blame the coercion node rather than the Const node, if
you're determined to preserve the existing behaviour.

[1] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2794623d292f7bbfe3134d1407281055acce584

[2] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/parser/parse_coerce.c;h=cd9b7b0cfbed03ec74f2cf295e4a7113627d7f72;hp=1244498ffb291b67d35917a6fdddb54b0d8d759d;hb=a2794623d292f7bbfe3134d1407281055acce584;hpb=6734182c169a1ecb74dd8495004e896ee4519adb

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-21 Thread Peter Geoghegan
On 21 February 2012 01:48, Tom Lane t...@sss.pgh.pa.us wrote:
 you're proposing to move the error pointer to the 42, and that does
 not seem like an improvement, especially not if it only happens when the
 cast subject is a simple constant rather than an expression.

I'm not actually proposing that though. What I'm proposing, quite
simply, is that the Const location actually be correct in all
circumstances. Now, I can understand why the Coercion node for this
query would have its current location starting from the CAST part in
your second example or would happen to be the same as the Constant in
your first, and I'm not questioning that. I'm questioning why the
Const node's location need to *always* be the same as that of the
Coercion node when pg_stat_statements walks the tree, since I'd have
imagined that Postgres has no business blaming the error that you've
shown on the Const node.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-20 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 Here is the single, hacky change I've made just for now to the core
 parser to quickly see if it all works as expected:

 *** transformTypeCast(ParseState *pstate, Ty
 *** 2108,2113 
 --- 2108,2116 
   if (location  0)
   location = tc-typeName-location;

 + if (IsA(expr, Const))
 + location = ((Const*)expr)-location;
 +
   result = coerce_to_target_type(pstate, expr, inputType,
  targetType, 
 targetTypmod,
  
 COERCION_EXPLICIT,

This does not look terribly sane to me.  AFAICS, the main effect of this
would be that if you have an error in coercing a literal to some
specified type, the error message would point at the literal and not
at the cast operator.  That is, in examples like these:

regression=# select 42::point;
ERROR:  cannot cast type integer to point
LINE 1: select 42::point;
 ^
regression=# select cast (42 as point);
ERROR:  cannot cast type integer to point
LINE 1: select cast (42 as point);
   ^

you're proposing to move the error pointer to the 42, and that does
not seem like an improvement, especially not if it only happens when the
cast subject is a simple constant rather than an expression.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-02-20 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 Another look around shows that the CoerceToDomain struct takes its
 location from the new Const location in turn, so my dirty little hack
 will break the location of the CoerceToDomain struct, giving an
 arguably incorrect caret position in some error messages. It would
 suit me if MyCoerceToDomain-arg (or the arg of a similar node
 related to coercion, like CoerceViaIO) pointed to a Const node with,
 potentially, and certainly in the case of my original CoerceToDomain
 test case, a distinct location to the coercion node itself.

Sorry, I'm not following.  What about that isn't true already?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers