Re: [HACKERS] Transient plans versus the SPI API

2011-08-19 Thread Tom Lane
[ getting back to the planner finally ]

Simon Riggs si...@2ndquadrant.com writes:
 On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 I think its possible to tell automatically whether we need to replan
 always or not based upon the path we take through selectivity
 functions.

 I don't really believe that, or at least I think it would only detect a
 few cases.

 The problem there is which executions we build custom plans for. That
 turns the problem into a sampling issue and you'll only fix the
 problems that occur with a frequency to match your sampling pattern
 and rate. Examples of situations where it won't help.

Sure, this is not going to solve every problem we have with the
planner.  What it is intended to solve is cases where someone is trying
to use the prepared-plan mechanisms but he would be a lot better off
with parameter-value-specific plans.  In particular:

 * plans that vary by table size will be about the same in the first 5
 executions. After large number of executions, things go bad.

This is a red herring.  The plancache code already arranges to replan
every time the relevant table stats are updated by autovacuum, which
should certainly happen from time to time if the table's contents are
changing materially.  If you're thinking in terms of plans being stale
then you're worrying about a long-since-solved problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-19 Thread Simon Riggs
On Fri, Aug 19, 2011 at 6:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ getting back to the planner finally ]

 Simon Riggs si...@2ndquadrant.com writes:
 On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 I think its possible to tell automatically whether we need to replan
 always or not based upon the path we take through selectivity
 functions.

 I don't really believe that, or at least I think it would only detect a
 few cases.

 The problem there is which executions we build custom plans for. That
 turns the problem into a sampling issue and you'll only fix the
 problems that occur with a frequency to match your sampling pattern
 and rate. Examples of situations where it won't help.

 Sure, this is not going to solve every problem we have with the
 planner.  What it is intended to solve is cases where someone is trying
 to use the prepared-plan mechanisms but he would be a lot better off
 with parameter-value-specific plans.  In particular:

I just realised this is exactly the same strategy as the
no-longer-used JDBC parameter prepareThreshold.

If we treat this in a similar way. prepare_threshold currently = 0 and
you are suggesting we move the value to 5. OK.

Will this be an actual parameter? If so, it removes my objection
because I can turn it off. What would be even better would be some
other controls, like a plugin that allows us to control the mechanism
or at least experiment with it.

Maybe we can assemble enough evidence to remove it before release.

I've been arguing it won't solve all problems. It won't. But if it
solves some, so its worth having.


On another point, I'd still like a one-shot plan flag, so that we
can act on that knowledge and have various pieces of code take
decisions that override the plan cache. i.e. if the plan screws up
during execution we can mark the plan as a one shot so it isn't
reused.


 * plans that vary by table size will be about the same in the first 5
 executions. After large number of executions, things go bad.

 This is a red herring.  The plancache code already arranges to replan
 every time the relevant table stats are updated by autovacuum, which
 should certainly happen from time to time if the table's contents are
 changing materially.  If you're thinking in terms of plans being stale
 then you're worrying about a long-since-solved problem.

Fair enough.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-12 Thread Bruce Momjian
Tom Lane wrote:
  Note that the SPI functions are more or less directly exposed in PL/Perl
  and PL/Python, and there are a number of existing idioms there that make
  use of prepared plans.  Changing the semantics of those functions might
  upset a lot of code.
 
 Right, but by the same token, if we don't change the default behavior,
 there is going to be a heck of a lot of code requiring manual adjustment
 before it can make use of the (hoped-to-be) improvements.  To me it
 makes more sense to change the default and then provide ways for people
 to lock down the behavior if the heuristic doesn't work for them.

Agreed.  I think the big sticking point is that without logic on how the
replanning will happen, users are having to guess how much impact this
new default behavior will have.  I also agree that this will harm some
uses but improve a larger pool of users.  Remember, the people on this
email list are probably using this feature in a much more sophisticated
way than the average user.

Also, there is a TODO idea that the results found by executing the query
(e.g. number of rows returned at each stage) could be fed back and
affect the replanning of queries.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-11 Thread Dimitri Fontaine
Hannu Krosing ha...@krosing.net writes:
 Hm, you mean reverse-engineering the parameterization of the query?

 Yes, basically re-generate the query after (or while) parsing, replacing
 constants and arguments with another set of generated arguments and
 printing the list of these arguments at the end. It may be easiest to do
 This in parallel with parsing.

 Interesting thought, but I really don't see a way to make it practical.

 Another place where this could be really useful is logging  monitoring

Another big use case is full support for materialized views: we could
then optimize a query to automatically use a matview even when written
against the “usual” schema.  Now matviews are another kind of indexes.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-08 Thread Anssi Kääriäinen

On 08/07/2011 12:25 PM, Hannu Krosing wrote:

On Sun, 2011-08-07 at 11:15 +0200, Hannu Krosing wrote:

On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote:

Hm, you mean reverse-engineering the parameterization of the query?

Yes, basically re-generate the query after (or while) parsing, replacing
constants and arguments with another set of generated arguments and
printing the list of these arguments at the end. It may be easiest to do
This in parallel with parsing.


Interesting thought, but I really don't see a way to make it practical.

Another place where this could be really useful is logging  monitoring

If there were an option to log the above queries as

SELECT a, b FROM foo WHERE c = $1, (123)
SELECT a, b FROM foo WHERE c = $1, (97)
SELECT a, b FROM foo WHERE c = $1, (236)

The main monitoring use_case would be pg_stat_statements,
http://developer.postgresql.org/pgdocs/postgres/pgstatstatements.html
which is currently pretty useless for queries without parameters


I was trying to implement something similar for pgpool-II. The user 
could configure queries for which cached plans are wanted. The 
configuration would have been a file containing lines in format SELECT 
* FROM foo WHERE id = ?. I did not get anything implemented, as there 
were some problems. The problems were mainly with DEALLOCATE ALL called 
without pgpool-II knowing it, issues with search_path and the amount of 
work needed to implement parse tree matching.


It would be interesting if pg_stat_statements would be globally 
available with queries using generic arguments. First, there would be an 
obvious heuristic for when to cache the plan: If the average runtime of 
the query is much larger than the average planning time, there is no 
point in caching the plan. This would also give one option for cache hit 
estimation. The hit_percent is directly available. On the other hand 
pg_stat_statements could easily become a choke-point.


I would love to work on this, but I lack the needed skills. Maybe I 
could take another try for writing a proof-of-concept parse tree 
transformer and matcher, but I doubt I can produce anything useful.


 - Anssi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-08 Thread Hannu Krosing
On Mon, 2011-08-08 at 11:39 +0300, Anssi Kääriäinen wrote:
 On 08/07/2011 12:25 PM, Hannu Krosing wrote:
  On Sun, 2011-08-07 at 11:15 +0200, Hannu Krosing wrote:
  On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote:
  Hm, you mean reverse-engineering the parameterization of the query?
  Yes, basically re-generate the query after (or while) parsing, replacing
  constants and arguments with another set of generated arguments and
  printing the list of these arguments at the end. It may be easiest to do
  This in parallel with parsing.
 
  Interesting thought, but I really don't see a way to make it practical.
  Another place where this could be really useful is logging  monitoring
 
  If there were an option to log the above queries as
 
  SELECT a, b FROM foo WHERE c = $1, (123)
  SELECT a, b FROM foo WHERE c = $1, (97)
  SELECT a, b FROM foo WHERE c = $1, (236)
  The main monitoring use_case would be pg_stat_statements,
  http://developer.postgresql.org/pgdocs/postgres/pgstatstatements.html
  which is currently pretty useless for queries without parameters
 
 I was trying to implement something similar for pgpool-II. The user 
 could configure queries for which cached plans are wanted. The 
 configuration would have been a file containing lines in format SELECT 
 * FROM foo WHERE id = ?. I did not get anything implemented, as there 
 were some problems. The problems were mainly with DEALLOCATE ALL called 
 without pgpool-II knowing it, issues with search_path and the amount of 
 work needed to implement parse tree matching.
 
 It would be interesting if pg_stat_statements would be globally 
 available with queries using generic arguments. First, there would be an 
 obvious heuristic for when to cache the plan: If the average runtime of 
 the query is much larger than the average planning time, there is no 
 point in caching the plan. This would also give one option for cache hit 
 estimation. The hit_percent is directly available. On the other hand 
 pg_stat_statements could easily become a choke-point.
 
 I would love to work on this, but I lack the needed skills. Maybe I 
 could take another try for writing a proof-of-concept parse tree 
 transformer and matcher, but I doubt I can produce anything useful.

That is why I think it is best done in the main parser - it has to parse
and analyse the query anyway and likely knows which constants are
arguments to the query.

If doing it outside the main backend parser, it would be best to still
use the postgreSQL lex/bison files as much as possible for this.


-- 
---
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-08 Thread Anssi Kääriäinen

On 08/08/2011 01:07 PM, Hannu Krosing wrote:

That is why I think it is best done in the main parser - it has to parse
and analyse the query anyway and likely knows which constants are
arguments to the query
As far as I understand the problem, the parsing must transform table 
references to schema-qualified references. The table_foobar in SELECT * 
FROM table_foobar WHERE id = ? is not enough to identify a table. Using 
search_path, query_str as a key is one possibility, but the search_path 
is likely to be different for each user, and this could result in the 
same query being cached multiple times.


By the way, I checked current Git HEAD and pg_stat_statements seems to 
not handle search_path correctly. For pg_stat_statements this is not 
critical, but if the raw query string is used as plan cache key things 
will obviously break...


 - Anssi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-07 Thread Hannu Krosing
On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  This seems like a good design.  Now what would be really cool is if
  you could observe a stream of queries like this:
 
  SELECT a, b FROM foo WHERE c = 123
  SELECT a, b FROM foo WHERE c = 97
  SELECT a, b FROM foo WHERE c = 236
 
  ...and say, hey, I could just make a generic plan and use it every
  time I see one of these.  It's not too clear to me how you'd make
  recognition of such queries cheap enough to be practical, but maybe
  someone will think of a way...
 
 Hm, you mean reverse-engineering the parameterization of the query?

Yes, basically re-generate the query after (or while) parsing, replacing
constants and arguments with another set of generated arguments and
printing the list of these arguments at the end. It may be easiest to do
This in parallel with parsing.

 Interesting thought, but I really don't see a way to make it practical.

Another place where this could be really useful is logging  monitoring

If there were an option to log the above queries as 

SELECT a, b FROM foo WHERE c = $1, (123)
SELECT a, b FROM foo WHERE c = $1, (97)
SELECT a, b FROM foo WHERE c = $1, (236)

it would make all kinds of general performance monitoring tasks also
much easier, not to mention that this forw would actually be something
that kan be cached internally.

For some users this might even be worth to use this feature alone,
without it providing Repeating Plan Recognition.

 In any case, it would amount to making up for a bad decision on the
 application side, ie, not transmitting the query in the parameterized
 form that presumably exists somewhere in the application.  I think
 we'd be better served all around by encouraging app developers to rely
 more heavily on parameterized queries ... but first we have to fix the
 performance risks there.
 
   regards, tom lane
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-07 Thread Hannu Krosing
On Sun, 2011-08-07 at 11:15 +0200, Hannu Krosing wrote:
 On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote:

  Hm, you mean reverse-engineering the parameterization of the query?
 
 Yes, basically re-generate the query after (or while) parsing, replacing
 constants and arguments with another set of generated arguments and
 printing the list of these arguments at the end. It may be easiest to do
 This in parallel with parsing.
 
  Interesting thought, but I really don't see a way to make it practical.
 
 Another place where this could be really useful is logging  monitoring
 
 If there were an option to log the above queries as 
 
 SELECT a, b FROM foo WHERE c = $1, (123)
 SELECT a, b FROM foo WHERE c = $1, (97)
 SELECT a, b FROM foo WHERE c = $1, (236)

The main monitoring use_case would be pg_stat_statements,
http://developer.postgresql.org/pgdocs/postgres/pgstatstatements.html
which is currently pretty useless for queries without parameters 

 it would make all kinds of general performance monitoring tasks also
 much easier, not to mention that this forw would actually be something
 that kan be cached internally.
 
 For some users this might even be worth to use this feature alone,
 without it providing Repeating Plan Recognition.
 
  In any case, it would amount to making up for a bad decision on the
  application side, ie, not transmitting the query in the parameterized
  form that presumably exists somewhere in the application.  I think
  we'd be better served all around by encouraging app developers to rely
  more heavily on parameterized queries ... but first we have to fix the
  performance risks there.
  
  regards, tom lane
  
 
 
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-07 Thread Simon Riggs
On Sat, Aug 6, 2011 at 7:29 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 I think we'll be a lot better off with the framework discussed last
 year: build a generic plan, as well as custom plans for the first few
 sets of parameter values, and then observe whether there's a significant
 reduction in estimated costs for the custom plans.

 Another way here would be to cache more than a single plan and to keep
 execution time samples or some other relevant runtime characteristics.
 Then what we need would be a way to switch from a plan to another at run
 time on some conditions, like realizing that the reason why the planner
 thought a nestloop would be perfect is obviously wrong, or maybe just
 based on runtime characteristics.

Tom and I discussed storing multiple sub-plans on a node back in '05
IIRC, and Tom later put in support for that.

That wasn't followed up on.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-06 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I think we'll be a lot better off with the framework discussed last
 year: build a generic plan, as well as custom plans for the first few
 sets of parameter values, and then observe whether there's a significant
 reduction in estimated costs for the custom plans.

Another way here would be to cache more than a single plan and to keep
execution time samples or some other relevant runtime characteristics.
Then what we need would be a way to switch from a plan to another at run
time on some conditions, like realizing that the reason why the planner
thought a nestloop would be perfect is obviously wrong, or maybe just
based on runtime characteristics.

 But in any case, it's way premature to be debating this until we have
 the infrastructure in which we can experiment with different policies.

That too.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-06 Thread Dimitri Fontaine
Jeff Davis pg...@j-davis.com writes:
 A control knob sounds limited. For instance, what if the application
 knows that some parameters will be constant over the time that the plan
 is saved? It would be nice to be able to bind some parameters to come up
 with a generic (but less generic) plan, and then execute it many times.
 Right now that can only be done by inlining such constants in the SQL,
 which is what we want to avoid.

+1

I was already thinking in those term at the application level for the
example I've been using before in this thread, and only reading your
mail I realize that maybe the backend should be able to do that itself.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Simon Riggs
On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 I think its possible to tell automatically whether we need to replan
 always or not based upon the path we take through selectivity
 functions.

 I don't really believe that, or at least I think it would only detect a
 few cases.  Examples of parameter-value-sensitive decisions that are
 made nowhere near the selectivity functions are constraint exclusion and
 LIKE pattern to index-qual conversion.  And in none of these cases do we
 really know at the bottom level whether a different parameter value will
 lead to a significant change in the finished plan.  For instance, if
 there's no index for column foo, it is a waste of time to force
 replanning just because we have varying selectivity estimates for
 WHERE foo  $1.

 I think we'll be a lot better off with the framework discussed last
 year: build a generic plan, as well as custom plans for the first few
 sets of parameter values, and then observe whether there's a significant
 reduction in estimated costs for the custom plans.

The problem there is which executions we build custom plans for. That
turns the problem into a sampling issue and you'll only fix the
problems that occur with a frequency to match your sampling pattern
and rate. Examples of situations where it won't help.

* plans that vary by table size will be about the same in the first 5
executions. After large number of executions, things go bad.

* text search using parameter is provided by user input - sensible
requests have low selectivities; some users put in space or e and
then we try to retrieve whole table by index scan. Almost impossible
to prevent all potentially high selectivity inputs from user. We could
add LIMIT but frequently ORM generated queries do not do that.

This isn't my-way-or-your-way - I think we need to look at some form
of safety barriers so we generate a plan but also know when the plan
has outlived its usefulness and force a re-plan.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Yeb Havinga

On 2011-08-03 21:19, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

This seems like a good design.  Now what would be really cool is if
you could observe a stream of queries like this:
SELECT a, b FROM foo WHERE c = 123
SELECT a, b FROM foo WHERE c = 97
SELECT a, b FROM foo WHERE c = 236
...and say, hey, I could just make a generic plan and use it every
time I see one of these.  It's not too clear to me how you'd make
recognition of such queries cheap enough to be practical, but maybe
someone will think of a way...

Hm, you mean reverse-engineering the parameterization of the query?
Interesting thought, but I really don't see a way to make it practical.


See also http://archives.postgresql.org/pgsql-hackers/2010-11/msg00617.php

I don't know if any implementation can be practical - maybe the parser 
could be coerced into emitting some kind of number that's based on 
everything in the query, except constants (and whitespace), so it would 
be the same for all the queries Robert described. That could be low cost 
enough to detect of for a query's id a cached plan exists and do more 
work only in those cases.


--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Peter Eisentraut
On tis, 2011-08-02 at 16:47 -0400, Tom Lane wrote:
 The most straightforward way to reimplement things within spi.c would
 be to redefine SPI_prepare as just doing the parse-and-rewrite steps,
 with planning always postponed to SPI_execute.  In the case where you
 just prepare and then execute a SPIPlan, this would come out the same
 or better, since we'd still just do one planning cycle, but the
 planner could be given the actual parameter values to use.  However,
 if you SPI_prepare, SPI_saveplan, and then SPI_execute many times, you
 might come out behind.  This is of course the same tradeoff we are
 going to impose at the SQL level anyway, but I wonder whether there
 needs to be a control knob available to C code to retain the old
 plan-once-and-always-use-that-plan approach. 

How about a new function like SPI_parse that has the new semantics?

Note that the SPI functions are more or less directly exposed in PL/Perl
and PL/Python, and there are a number of existing idioms there that make
use of prepared plans.  Changing the semantics of those functions might
upset a lot of code.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 How about a new function like SPI_parse that has the new semantics?

Yeah, I'd considered that idea (and even exactly that name for it).
Howver, the disadvantage of inventing a separate entry point is that
it isn't going to be nice for multi-level call chains, of which there
are several inside the core code and probably plenty elsewhere.  The
bottom level would have to do something like

if (new-behavior-wanted)
SPI_parse(args...);
else
SPI_prepare(args...);

and then invent some way for its callers to signal new-behavior-wanted,
and it won't be pretty if they all pick different ways to do that.

Plus we've already got SPI_prepare_cursor and SPI_prepare_params, each
of which would need a matching SPI_parse_foo entry point.

So if we want a knob here, I think that the sanest way to install it is
to add a couple more flag bits to the existing int cursorOptions
bitmask arguments of the latter two functions, perhaps

CURSOR_OPT_USE_GENERIC_PLAN
CURSOR_OPT_USE_CUSTOM_PLAN

to force generic-plan-always or custom-plan-always respectively.
(The cursor naming of those flag bits is starting to look a bit
unfortunate, but I'm not inclined to rename them now.)

If we set it up like that, then the default behavior with flags == 0
would be to use the heuristic plan-selection approach, and presumably
that is what you would also get from SPI_prepare (which is both coded
and documented as matching SPI_prepare_cursor with flags == 0).

So the question is whether it's okay to change the default behavior...

 Note that the SPI functions are more or less directly exposed in PL/Perl
 and PL/Python, and there are a number of existing idioms there that make
 use of prepared plans.  Changing the semantics of those functions might
 upset a lot of code.

Right, but by the same token, if we don't change the default behavior,
there is going to be a heck of a lot of code requiring manual adjustment
before it can make use of the (hoped-to-be) improvements.  To me it
makes more sense to change the default and then provide ways for people
to lock down the behavior if the heuristic doesn't work for them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Jeff Davis
On Tue, 2011-08-02 at 16:47 -0400, Tom Lane wrote:
 The most straightforward way to reimplement things within spi.c would be
 to redefine SPI_prepare as just doing the parse-and-rewrite steps, with
 planning always postponed to SPI_execute.  In the case where you just
 prepare and then execute a SPIPlan, this would come out the same or
 better, since we'd still just do one planning cycle, but the planner could
 be given the actual parameter values to use.  However, if you SPI_prepare,
 SPI_saveplan, and then SPI_execute many times, you might come out behind.
 This is of course the same tradeoff we are going to impose at the SQL level
 anyway, but I wonder whether there needs to be a control knob available to
 C code to retain the old plan-once-and-always-use-that-plan approach.

Would there ultimately be a difference between the way SPI_prepare and
PQprepare work? It seems like the needs would be about the same, so I
think we should be consistent.

Also, I assume that SPI_execute and PQexecParams would always force a
custom plan, just like always, right?

A control knob sounds limited. For instance, what if the application
knows that some parameters will be constant over the time that the plan
is saved? It would be nice to be able to bind some parameters to come up
with a generic (but less generic) plan, and then execute it many times.
Right now that can only be done by inlining such constants in the SQL,
which is what we want to avoid.

I'm a little bothered by prepare sometimes planning and sometimes not
(and, by implication, execute_plan sometimes planning and sometimes
not). It seems cleaner to just separate the steps into parse+rewrite,
bind parameters, plan (with whatever parameters are present, giving a
more generic plan when some aren't specified), and execute (which would
require you to specify any parameters not bound yet). Maybe we don't
need to expose all of those steps (although maybe we do), but it would
be nice if the API we do offer resembles those steps.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Jeff Davis
On Wed, 2011-08-03 at 12:19 -0400, Tom Lane wrote:
 Of course we could address the worst cases by providing some mechanism
 to tell the plancache code always use a generic plan for this query
 or always use a custom plan.  I'm not entirely thrilled with that,
 because it's effectively a planner hint and has got the same problems
 as all planner hints, namely that users are likely to get it wrong.

I'm not entirely convinced by that. It's fairly challenging for a human
to choose a good plan for a moderately complex SQL query, and its much
more likely that the plan will become a bad one over time. But, in many
cases, a developer knows if they simply don't care about planning time,
and are willing to always replan.

Also, we have a fairly reasonable model for planning SQL queries, but
I'm not sure that the model for determining whether to replan a SQL
query is quite as clear. Simon brought up some useful points along these
lines.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Jeff Davis
On Wed, 2011-08-03 at 13:07 -0400, Robert Haas wrote:
 A little OT here, but (as I think Simon said elsewhere) I think we
 really ought to be considering the table statistics when deciding
 whether or not to replan.  It seems to me that the overwhelmingly
 common case where this is going to come up is when (some subset of)
 the MCVs require a different plan than run-of-the-mill values.  It
 would be nice to somehow work that out.

That blurs the line a little bit. It sounds like this might be described
as incremental planning, and perhaps that's a good way to think about
it.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Anyone have an opinion about that?

I still have this application where PREPARE takes between 50ms and 300ms
and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 1 EXECUTE
quite easily.  (Yes the database fits in RAM, and yes when that's no
longer the case we just upgrade the hardware)

What does your proposal mean for such a use case?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Anyone have an opinion about that?

 I still have this application where PREPARE takes between 50ms and 300ms
 and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 1 EXECUTE
 quite easily.  (Yes the database fits in RAM, and yes when that's no
 longer the case we just upgrade the hardware)

 What does your proposal mean for such a use case?

Well, the policy for when to replan or not remains to be worked out in
detail, but what is likely to happen for such cases is that we'll waste
a few planning cycles before determining that there's no benefit in a
custom plan.  So, using the worst-case ends of your ranges above and
assuming that a few means 10, we'd go from 300 + 5 * 1 = 50300
ms to execute the query 1 times, to 10 * 300 + 5 * 1 = 53000 ms.
So yes, it'd get a little worse for that use-case.  But you have to
weigh that against the likelihood that other use-cases will get better.
If our requirement for a transient-plan mechanism is that no individual
case can ever be worse than before, then we might as well abandon the
entire project right now, because the only way to meet that requirement
is to change nothing.

Of course we could address the worst cases by providing some mechanism
to tell the plancache code always use a generic plan for this query
or always use a custom plan.  I'm not entirely thrilled with that,
because it's effectively a planner hint and has got the same problems
as all planner hints, namely that users are likely to get it wrong.
But it would be relatively painless to supply such a hint at the SPI
level, which is why I asked whether we should.  It'd be much harder to
do something equivalent at higher levels, which is why I'm not that
eager to do it for SPI.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 12:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Anyone have an opinion about that?

 I still have this application where PREPARE takes between 50ms and 300ms
 and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 1 EXECUTE
 quite easily.  (Yes the database fits in RAM, and yes when that's no
 longer the case we just upgrade the hardware)

 What does your proposal mean for such a use case?

 Well, the policy for when to replan or not remains to be worked out in
 detail, but what is likely to happen for such cases is that we'll waste
 a few planning cycles before determining that there's no benefit in a
 custom plan.  So, using the worst-case ends of your ranges above and
 assuming that a few means 10, we'd go from 300 + 5 * 1 = 50300
 ms to execute the query 1 times, to 10 * 300 + 5 * 1 = 53000 ms.

A little OT here, but (as I think Simon said elsewhere) I think we
really ought to be considering the table statistics when deciding
whether or not to replan.  It seems to me that the overwhelmingly
common case where this is going to come up is when (some subset of)
the MCVs require a different plan than run-of-the-mill values.  It
would be nice to somehow work that out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 So yes, it'd get a little worse for that use-case.  But you have to
 weigh that against the likelihood that other use-cases will get better.
 If our requirement for a transient-plan mechanism is that no individual
 case can ever be worse than before, then we might as well abandon the
 entire project right now, because the only way to meet that requirement
 is to change nothing.

That is not were I wanted to drift.  It's just that I don't have as much
time as I would like to those days, and so it helps me a lot seeing a
worked out example rather than make sure I parse your proposal
correctly.  Thanks a lot for your answer, I have a very clear
confirmation on how I read your previous email.

I will have to do some testing, but it could well be that this
application will benefit from locking reductions enough that it buys
this effect back.

 Of course we could address the worst cases by providing some mechanism
 to tell the plancache code always use a generic plan for this query
 or always use a custom plan.  I'm not entirely thrilled with that,
 because it's effectively a planner hint and has got the same problems
 as all planner hints, namely that users are likely to get it wrong.

Yeah.

 But it would be relatively painless to supply such a hint at the SPI
 level, which is why I asked whether we should.  It'd be much harder to
 do something equivalent at higher levels, which is why I'm not that
 eager to do it for SPI.

Given the SLA of those prepared queries in my case, I think I could
accept to have to switch from SQL statements to C coded SRF to guarantee
the planning behavior.  It will not make the upgrade cheaper, but I
realize it's a very narrow and specific use case.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 This seems like a good design.  Now what would be really cool is if
 you could observe a stream of queries like this:

 SELECT a, b FROM foo WHERE c = 123
 SELECT a, b FROM foo WHERE c = 97
 SELECT a, b FROM foo WHERE c = 236

 ...and say, hey, I could just make a generic plan and use it every
 time I see one of these.  It's not too clear to me how you'd make
 recognition of such queries cheap enough to be practical, but maybe
 someone will think of a way...

Hm, you mean reverse-engineering the parameterization of the query?
Interesting thought, but I really don't see a way to make it practical.

In any case, it would amount to making up for a bad decision on the
application side, ie, not transmitting the query in the parameterized
form that presumably exists somewhere in the application.  I think
we'd be better served all around by encouraging app developers to rely
more heavily on parameterized queries ... but first we have to fix the
performance risks there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 3:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 This seems like a good design.  Now what would be really cool is if
 you could observe a stream of queries like this:

 SELECT a, b FROM foo WHERE c = 123
 SELECT a, b FROM foo WHERE c = 97
 SELECT a, b FROM foo WHERE c = 236

 ...and say, hey, I could just make a generic plan and use it every
 time I see one of these.  It's not too clear to me how you'd make
 recognition of such queries cheap enough to be practical, but maybe
 someone will think of a way...

 Hm, you mean reverse-engineering the parameterization of the query?
 Interesting thought, but I really don't see a way to make it practical.

 In any case, it would amount to making up for a bad decision on the
 application side, ie, not transmitting the query in the parameterized
 form that presumably exists somewhere in the application.  I think
 we'd be better served all around by encouraging app developers to rely
 more heavily on parameterized queries ... but first we have to fix the
 performance risks there.

Fair enough.  I have to admit I'm afraid of them right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-03 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 I think its possible to tell automatically whether we need to replan
 always or not based upon the path we take through selectivity
 functions.

I don't really believe that, or at least I think it would only detect a
few cases.  Examples of parameter-value-sensitive decisions that are
made nowhere near the selectivity functions are constraint exclusion and
LIKE pattern to index-qual conversion.  And in none of these cases do we
really know at the bottom level whether a different parameter value will
lead to a significant change in the finished plan.  For instance, if
there's no index for column foo, it is a waste of time to force
replanning just because we have varying selectivity estimates for
WHERE foo  $1.

I think we'll be a lot better off with the framework discussed last
year: build a generic plan, as well as custom plans for the first few
sets of parameter values, and then observe whether there's a significant
reduction in estimated costs for the custom plans.

But in any case, it's way premature to be debating this until we have
the infrastructure in which we can experiment with different policies.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Transient plans versus the SPI API

2011-08-02 Thread Tom Lane
I've been thinking about how to redesign the plancache infrastructure to
better support use of transient (one-shot) plans, as we've talked about
various times such as in this thread:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00607.php
(Note: that thread sorta went off into the weeds arguing about exactly
what heuristics to use for when to re-plan.  I don't want to re-open that
issue today, since there's no way to experiment with policy until we have
some mechanism in place.)

I think that what we need to do is get rid of the assumption that a cached
plan normally includes a plan per se.  The initial creation of the cache
entry should just supply a raw query plus its analyzed-and-rewritten form.
(plancache.c can actually operate that way today, via its not fully
planned flag, but it's a wart rather than the normal philosophy.)  Then
RevalidateCachedPlan should be replaced by something with the semantics
of get me a plan to use, and here's the parameter values I'm going to use
it with.  The choice between using a pre-cached generic plan and building
a one-off plan would then be localized in this new function.

There are not that many places that call plancache.c directly, and so this
change in API won't cause much code churn --- but one place that does
depend on this is spi.c, and there is *lots* of both core and third-party
code that calls SPI_prepare for example.  So we need to tread carefully in
redefining SPI's behavior.

The most straightforward way to reimplement things within spi.c would be
to redefine SPI_prepare as just doing the parse-and-rewrite steps, with
planning always postponed to SPI_execute.  In the case where you just
prepare and then execute a SPIPlan, this would come out the same or
better, since we'd still just do one planning cycle, but the planner could
be given the actual parameter values to use.  However, if you SPI_prepare,
SPI_saveplan, and then SPI_execute many times, you might come out behind.
This is of course the same tradeoff we are going to impose at the SQL level
anyway, but I wonder whether there needs to be a control knob available to
C code to retain the old plan-once-and-always-use-that-plan approach.

Anyone have an opinion about that?  And if we do need to expose some
control, should the default (if you don't change your source code) be that
you still get the old behavior, or that you get the new behavior?  I'm
inclined to think that if we believe this'll be a win at the SQL level,
it should be a win at the SPI-caller level too, but maybe someone thinks
otherwise.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-02 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 The most straightforward way to reimplement things within spi.c
 would be to redefine SPI_prepare as just doing the
 parse-and-rewrite steps, with planning always postponed to
 SPI_execute.  In the case where you just prepare and then execute
 a SPIPlan, this would come out the same or better, since we'd
 still just do one planning cycle, but the planner could be given
 the actual parameter values to use.  However, if you SPI_prepare,
 SPI_saveplan, and then SPI_execute many times, you might come out
 behind.  This is of course the same tradeoff we are going to
 impose at the SQL level anyway, but I wonder whether there needs
 to be a control knob available to C code to retain the old
 plan-once-and-always-use-that-plan approach.
 
 Anyone have an opinion about that?
 
I have a few places I've used SPI_saveplan where there is really
only one sensible plan, so I'm pretty sure it would be a loss to use
the new technique in those places.  Now, whether that would be a
loss that would be big enough for anyone to notice (or even to
reliably measure) is another question.  It wouldn't surprise me if
the difference was insignificant, but it would be reassuring to have
an easy way to check
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-02 Thread Robert Haas
On Tue, Aug 2, 2011 at 4:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I've been thinking about how to redesign the plancache infrastructure to
 better support use of transient (one-shot) plans, as we've talked about
 various times such as in this thread:
 http://archives.postgresql.org/pgsql-hackers/2010-02/msg00607.php
 (Note: that thread sorta went off into the weeds arguing about exactly
 what heuristics to use for when to re-plan.  I don't want to re-open that
 issue today, since there's no way to experiment with policy until we have
 some mechanism in place.)

 I think that what we need to do is get rid of the assumption that a cached
 plan normally includes a plan per se.  The initial creation of the cache
 entry should just supply a raw query plus its analyzed-and-rewritten form.
 (plancache.c can actually operate that way today, via its not fully
 planned flag, but it's a wart rather than the normal philosophy.)  Then
 RevalidateCachedPlan should be replaced by something with the semantics
 of get me a plan to use, and here's the parameter values I'm going to use
 it with.  The choice between using a pre-cached generic plan and building
 a one-off plan would then be localized in this new function.

This seems like a good design.  Now what would be really cool is if
you could observe a stream of queries like this:

SELECT a, b FROM foo WHERE c = 123
SELECT a, b FROM foo WHERE c = 97
SELECT a, b FROM foo WHERE c = 236

...and say, hey, I could just make a generic plan and use it every
time I see one of these.  It's not too clear to me how you'd make
recognition of such queries cheap enough to be practical, but maybe
someone will think of a way...

 There are not that many places that call plancache.c directly, and so this
 change in API won't cause much code churn --- but one place that does
 depend on this is spi.c, and there is *lots* of both core and third-party
 code that calls SPI_prepare for example.  So we need to tread carefully in
 redefining SPI's behavior.

 The most straightforward way to reimplement things within spi.c would be
 to redefine SPI_prepare as just doing the parse-and-rewrite steps, with
 planning always postponed to SPI_execute.  In the case where you just
 prepare and then execute a SPIPlan, this would come out the same or
 better, since we'd still just do one planning cycle, but the planner could
 be given the actual parameter values to use.  However, if you SPI_prepare,
 SPI_saveplan, and then SPI_execute many times, you might come out behind.
 This is of course the same tradeoff we are going to impose at the SQL level
 anyway, but I wonder whether there needs to be a control knob available to
 C code to retain the old plan-once-and-always-use-that-plan approach.

 Anyone have an opinion about that?  And if we do need to expose some
 control, should the default (if you don't change your source code) be that
 you still get the old behavior, or that you get the new behavior?  I'm
 inclined to think that if we believe this'll be a win at the SQL level,
 it should be a win at the SPI-caller level too, but maybe someone thinks
 otherwise.

I am not sure about this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-02 Thread Simon Riggs
On Tue, Aug 2, 2011 at 9:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 The most straightforward way to reimplement things within spi.c would be
 to redefine SPI_prepare as just doing the parse-and-rewrite steps, with
 planning always postponed to SPI_execute.  In the case where you just
 prepare and then execute a SPIPlan, this would come out the same or
 better, since we'd still just do one planning cycle, but the planner could
 be given the actual parameter values to use.  However, if you SPI_prepare,
 SPI_saveplan, and then SPI_execute many times, you might come out behind.
 This is of course the same tradeoff we are going to impose at the SQL level
 anyway, but I wonder whether there needs to be a control knob available to
 C code to retain the old plan-once-and-always-use-that-plan approach.

The problems only occur 1% of the time, so this penalises everyone to
avoid real but rare problems.

This will cause a massive loss of performance in most apps, though I
understand the annoyance and why you make the suggestion.

http://www.db2ude.com/?q=node/73 for some more background on how this
is handled elsewhere
Control knob == hint, so I've avoided suggesting such an approach myself.

I think its possible to tell automatically whether we need to replan
always or not based upon the path we take through selectivity
functions.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers