Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-19 Thread Robert Haas
On Mon, Nov 17, 2014 at 4:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 One thing that occurs to me is that if the generic plan estimate comes
 out much cheaper than the custom one, maybe we should assume that the
 generic's cost estimate is bogus.  Right offhand I can't think of a reason
 for a custom plan to look worse than a generic one, unless there's a
 statistical quirk like this one.

 That's an interesting idea, but what do we do after deciding that it's
 bogus?

 Keep using custom plans.  It's possible that the estimate that's in error
 is the custom one, but that's not the way to bet IMO, since the custom
 plan estimate is based on better information.

 The generic plan really can't be cheaper than the custom plan,
 but it could be the same price, or as close as makes no difference.

 Right, and what we want to do is use the generic plan as long as it's
 close to the same cost (close enough to not justify replanning effort).
 The trick here is to not be fooled by estimation errors.  Can we assume
 that generic cost  custom cost is always an estimation error?

Maybe.  It seems like kind of a fragile bet to me.  There's going to
be some qual selectivity below which an index scan on a particular
table outperforms a sequential scan, but the selectivity estimated for
a generic plan can be either higher or lower than the selectivity we'd
estimate for some particular value.  And once one of the two plans
decides on an index scan while the other one divides on a sequential
scan, it can cascade through and change the whole plan - e.g. because
it affects whether the tuples emerge with usable pathkeys.  I don't
feel very confident about assuming that applying  to the result of
all that is going to tell us anything useful.

I think what's really going on here is that the generic plan will be
optimal for some range of possible qual selectivities.  Typically, the
qual is of the form col = val, so the plan will be optimal for all
values where the estimated frequency is between some values A and B.
What would be nice is to notice when we see a value that is outside
that range, and switch to a custom plan in that case.  I realize that
the planner isn't really well set up to deliver the information we'd
need to test that for each new supplied value, but that's really what
we need to do.  The current system wastes CPU cycles by replanning up
to 5 times even when there is no benefit to be gained by it, but can
also cause big performance problems when it settles into a generic
plan and then a value with different characteristics shows up later
on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-18 Thread Alban Hertroys
On 17 November 2014 22:27, Tom Lane t...@sss.pgh.pa.us wrote:
 Another idea that occurred to me is to run a planning cycle in which the
 actual parameter values are made available to the planner, but as
 estimates not hard constants (this facility already exists, it's just not
 being used by plancache.c).  This would yield cost estimates that are more
 safely comparable to the custom plan.  But I'm not sure that we'd want to
 expend yet another planning cycle to do this, nor am I sure that we'd want
 to use such a plan as The Generic Plan.

 regards, tom lane

Perhaps a somewhat naive idea, I only have the broad picture of how
the query planner works, but...
What if prepared statements would not store an entirely pinned down
version of the query plan, but instead stores a smashed down version
of the query plan that still leaves room for choosing some different
paths based on key decision criteria?

For example, if an input parameter value matches the most common
values, choose the sequential scan path (as in the OP's case, IIRC)
and if it's not, attempt an index scan.
Of course, one implication of doing this is likely a decrease in
planning performance (which matters for simple queries), but if it
results in better plan choices for complex queries that may be a nett
gain.

I recently followed an introductory class about neural networks and
the decision logic seems to look like the neuron/perceptron pattern.

I'm just throwing this out here in case it's a viable option and
nobody else in the world thought of this, however unlikely ;)

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-17 Thread Robert Haas
On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 One thing that occurs to me is that if the generic plan estimate comes
 out much cheaper than the custom one, maybe we should assume that the
 generic's cost estimate is bogus.  Right offhand I can't think of a reason
 for a custom plan to look worse than a generic one, unless there's a
 statistical quirk like this one.

That's an interesting idea, but what do we do after deciding that it's
bogus?  The generic plan really can't be cheaper than the custom plan,
but it could be the same price, or as close as makes no difference.

I have long thought that maybe the custom vs. generic plan decision
should have something to do with whether the parameters are MCVs of
the table columns they're being compared against.  This case is
interesting because it demonstrates that querying for a *non*-MCV can
require a switch to a custom plan, which is something I don't think
I've seen before.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 One thing that occurs to me is that if the generic plan estimate comes
 out much cheaper than the custom one, maybe we should assume that the
 generic's cost estimate is bogus.  Right offhand I can't think of a reason
 for a custom plan to look worse than a generic one, unless there's a
 statistical quirk like this one.

 That's an interesting idea, but what do we do after deciding that it's
 bogus?

Keep using custom plans.  It's possible that the estimate that's in error
is the custom one, but that's not the way to bet IMO, since the custom
plan estimate is based on better information.

 The generic plan really can't be cheaper than the custom plan,
 but it could be the same price, or as close as makes no difference.

Right, and what we want to do is use the generic plan as long as it's
close to the same cost (close enough to not justify replanning effort).
The trick here is to not be fooled by estimation errors.  Can we assume
that generic cost  custom cost is always an estimation error?

Another idea that occurred to me is to run a planning cycle in which the
actual parameter values are made available to the planner, but as
estimates not hard constants (this facility already exists, it's just not
being used by plancache.c).  This would yield cost estimates that are more
safely comparable to the custom plan.  But I'm not sure that we'd want to
expend yet another planning cycle to do this, nor am I sure that we'd want
to use such a plan as The Generic Plan.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-17 Thread Sam Saffron
One interesting option would be kicking in an extra more expensive
planning cycle after the Nth run of the query, in general a lot of
these planned queries run 1000s of times, if you add some extra cost
to run 100 it may not be prohibitive cost wise.

On Tue, Nov 18, 2014 at 8:27 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 One thing that occurs to me is that if the generic plan estimate comes
 out much cheaper than the custom one, maybe we should assume that the
 generic's cost estimate is bogus.  Right offhand I can't think of a reason
 for a custom plan to look worse than a generic one, unless there's a
 statistical quirk like this one.

 That's an interesting idea, but what do we do after deciding that it's
 bogus?

 Keep using custom plans.  It's possible that the estimate that's in error
 is the custom one, but that's not the way to bet IMO, since the custom
 plan estimate is based on better information.

 The generic plan really can't be cheaper than the custom plan,
 but it could be the same price, or as close as makes no difference.

 Right, and what we want to do is use the generic plan as long as it's
 close to the same cost (close enough to not justify replanning effort).
 The trick here is to not be fooled by estimation errors.  Can we assume
 that generic cost  custom cost is always an estimation error?

 Another idea that occurred to me is to run a planning cycle in which the
 actual parameter values are made available to the planner, but as
 estimates not hard constants (this facility already exists, it's just not
 being used by plancache.c).  This would yield cost estimates that are more
 safely comparable to the custom plan.  But I'm not sure that we'd want to
 expend yet another planning cycle to do this, nor am I sure that we'd want
 to use such a plan as The Generic Plan.

 regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-13 Thread Tom Lane
Sam Saffron sam.saff...@gmail.com writes:
 I have hit a rather odd issue with prepared queries on both pg 9.3 and 9.4 
 beta.
 I have this table (copy at http://samsaffron.com/testing.db.gz) with a
 very odd performance profile:

Interesting case.  The issue seems to be that your statistics look like
this:

 select * from pg_stats where tablename = 'topics' and attname = 'archetype';
 schemaname | tablename |  attname  | inherited | null_frac | avg_width | 
n_distinct | most_common_vals  |  most_common_freqs  | histogram_bounds 
| correlation | most_common_elems | most_common_elem_freqs | 
elem_count_histogram
+---+---+---+---+---++---+-+--+-+---++-
 public | topics| archetype | f | 0 |12 |   
   2 | {private_message,regular} | {0.604957,0.395043} |  |
0.612985 |   || 
(1 row)

That is, archetype consists of 60% 'private_message', 40% 'regular', and
absolutely nothing else.  So the condition archetype = 'banner' is very
selective, and a plan built knowing that that is the parameter value will
use the index:

# explain select * from topics where archetype = 'banner' limit 1;
  QUERY PLAN  
--
 Limit  (cost=0.29..651.49 rows=1 width=520)
   -  Index Scan using idx11 on topics  (cost=0.29..651.49 rows=1 width=520)
 Index Cond: ((archetype)::text = 'banner'::text)
(3 rows)

However, that's still a pretty darn expensive indexscan, mainly because
archetype is not the leading key ... if you care about the performance
of this query, why don't you have an index to match?

# create index on topics(archetype);
CREATE INDEX
# explain select * from topics where archetype = 'banner' limit 1;
QUERY PLAN  
   
---
 Limit  (cost=0.29..6.80 rows=1 width=520)
   -  Index Scan using topics_archetype_idx on topics  (cost=0.29..6.80 rows=1 
width=520)
 Index Cond: ((archetype)::text = 'banner'::text)
(3 rows)

However, just fixing the index availability actually makes the performance
ratio even worse, because the prepared query still doesn't use the index:

# explain execute foo('banner');
 QUERY PLAN  
-
 Limit  (cost=0.00..0.11 rows=1 width=520)
   -  Seq Scan on topics  (cost=0.00..1158.19 rows=10088 width=520)
 Filter: ((archetype)::text = $1)
(3 rows)

(Yes, you can get this result in psql, you just need to repeat the EXECUTE
half a dozen times until it shifts to a generic plan.)

The problem here is that without knowledge of the comparison value, the
planner assumes that it will probably be one of the two values that make
up the table content.  (After all, why would you query for something
else?)  On that basis, a seqscan will probably hit a matching row in no
time, and so (with the LIMIT 1) it looks like a better deal than the
indexscan.

We've talked about this type of problem before.  Part of the issue is
that costing of LIMIT doesn't apply any penalty for a bad worst-case
scenario, and part of it is that the heuristics for choosing between
custom and generic plans don't consider the possibility that the generic
plan's estimated cost is way wrong for lack of knowledge of the comparison
value.  It's not real obvious how to improve either heuristic without
probably making some cases worse.

One thing that occurs to me is that if the generic plan estimate comes
out much cheaper than the custom one, maybe we should assume that the
generic's cost estimate is bogus.  Right offhand I can't think of a reason
for a custom plan to look worse than a generic one, unless there's a
statistical quirk like this one.

In the meantime, I assume that your real data contains a small percentage
of values other than these two?  If so, maybe cranking up the statistics
target would help.  If the planner knows that there are more than two
values in the column, I think it would be less optimistic about assuming
that the comparison value is one of the big two.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers