Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
On Mon, Nov 17, 2014 at 4:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: One thing that occurs to me is that if the generic plan estimate comes out much cheaper than the custom one, maybe we should assume that the generic's cost estimate is bogus. Right offhand I can't think of a reason for a custom plan to look worse than a generic one, unless there's a statistical quirk like this one. That's an interesting idea, but what do we do after deciding that it's bogus? Keep using custom plans. It's possible that the estimate that's in error is the custom one, but that's not the way to bet IMO, since the custom plan estimate is based on better information. The generic plan really can't be cheaper than the custom plan, but it could be the same price, or as close as makes no difference. Right, and what we want to do is use the generic plan as long as it's close to the same cost (close enough to not justify replanning effort). The trick here is to not be fooled by estimation errors. Can we assume that generic cost custom cost is always an estimation error? Maybe. It seems like kind of a fragile bet to me. There's going to be some qual selectivity below which an index scan on a particular table outperforms a sequential scan, but the selectivity estimated for a generic plan can be either higher or lower than the selectivity we'd estimate for some particular value. And once one of the two plans decides on an index scan while the other one divides on a sequential scan, it can cascade through and change the whole plan - e.g. because it affects whether the tuples emerge with usable pathkeys. I don't feel very confident about assuming that applying to the result of all that is going to tell us anything useful. I think what's really going on here is that the generic plan will be optimal for some range of possible qual selectivities. Typically, the qual is of the form col = val, so the plan will be optimal for all values where the estimated frequency is between some values A and B. What would be nice is to notice when we see a value that is outside that range, and switch to a custom plan in that case. I realize that the planner isn't really well set up to deliver the information we'd need to test that for each new supplied value, but that's really what we need to do. The current system wastes CPU cycles by replanning up to 5 times even when there is no benefit to be gained by it, but can also cause big performance problems when it settles into a generic plan and then a value with different characteristics shows up later on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
On 17 November 2014 22:27, Tom Lane t...@sss.pgh.pa.us wrote: Another idea that occurred to me is to run a planning cycle in which the actual parameter values are made available to the planner, but as estimates not hard constants (this facility already exists, it's just not being used by plancache.c). This would yield cost estimates that are more safely comparable to the custom plan. But I'm not sure that we'd want to expend yet another planning cycle to do this, nor am I sure that we'd want to use such a plan as The Generic Plan. regards, tom lane Perhaps a somewhat naive idea, I only have the broad picture of how the query planner works, but... What if prepared statements would not store an entirely pinned down version of the query plan, but instead stores a smashed down version of the query plan that still leaves room for choosing some different paths based on key decision criteria? For example, if an input parameter value matches the most common values, choose the sequential scan path (as in the OP's case, IIRC) and if it's not, attempt an index scan. Of course, one implication of doing this is likely a decrease in planning performance (which matters for simple queries), but if it results in better plan choices for complex queries that may be a nett gain. I recently followed an introductory class about neural networks and the decision logic seems to look like the neuron/perceptron pattern. I'm just throwing this out here in case it's a viable option and nobody else in the world thought of this, however unlikely ;) -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: One thing that occurs to me is that if the generic plan estimate comes out much cheaper than the custom one, maybe we should assume that the generic's cost estimate is bogus. Right offhand I can't think of a reason for a custom plan to look worse than a generic one, unless there's a statistical quirk like this one. That's an interesting idea, but what do we do after deciding that it's bogus? The generic plan really can't be cheaper than the custom plan, but it could be the same price, or as close as makes no difference. I have long thought that maybe the custom vs. generic plan decision should have something to do with whether the parameters are MCVs of the table columns they're being compared against. This case is interesting because it demonstrates that querying for a *non*-MCV can require a switch to a custom plan, which is something I don't think I've seen before. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
Robert Haas robertmh...@gmail.com writes: On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: One thing that occurs to me is that if the generic plan estimate comes out much cheaper than the custom one, maybe we should assume that the generic's cost estimate is bogus. Right offhand I can't think of a reason for a custom plan to look worse than a generic one, unless there's a statistical quirk like this one. That's an interesting idea, but what do we do after deciding that it's bogus? Keep using custom plans. It's possible that the estimate that's in error is the custom one, but that's not the way to bet IMO, since the custom plan estimate is based on better information. The generic plan really can't be cheaper than the custom plan, but it could be the same price, or as close as makes no difference. Right, and what we want to do is use the generic plan as long as it's close to the same cost (close enough to not justify replanning effort). The trick here is to not be fooled by estimation errors. Can we assume that generic cost custom cost is always an estimation error? Another idea that occurred to me is to run a planning cycle in which the actual parameter values are made available to the planner, but as estimates not hard constants (this facility already exists, it's just not being used by plancache.c). This would yield cost estimates that are more safely comparable to the custom plan. But I'm not sure that we'd want to expend yet another planning cycle to do this, nor am I sure that we'd want to use such a plan as The Generic Plan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
One interesting option would be kicking in an extra more expensive planning cycle after the Nth run of the query, in general a lot of these planned queries run 1000s of times, if you add some extra cost to run 100 it may not be prohibitive cost wise. On Tue, Nov 18, 2014 at 8:27 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: One thing that occurs to me is that if the generic plan estimate comes out much cheaper than the custom one, maybe we should assume that the generic's cost estimate is bogus. Right offhand I can't think of a reason for a custom plan to look worse than a generic one, unless there's a statistical quirk like this one. That's an interesting idea, but what do we do after deciding that it's bogus? Keep using custom plans. It's possible that the estimate that's in error is the custom one, but that's not the way to bet IMO, since the custom plan estimate is based on better information. The generic plan really can't be cheaper than the custom plan, but it could be the same price, or as close as makes no difference. Right, and what we want to do is use the generic plan as long as it's close to the same cost (close enough to not justify replanning effort). The trick here is to not be fooled by estimation errors. Can we assume that generic cost custom cost is always an estimation error? Another idea that occurred to me is to run a planning cycle in which the actual parameter values are made available to the planner, but as estimates not hard constants (this facility already exists, it's just not being used by plancache.c). This would yield cost estimates that are more safely comparable to the custom plan. But I'm not sure that we'd want to expend yet another planning cycle to do this, nor am I sure that we'd want to use such a plan as The Generic Plan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
Sam Saffron sam.saff...@gmail.com writes: I have hit a rather odd issue with prepared queries on both pg 9.3 and 9.4 beta. I have this table (copy at http://samsaffron.com/testing.db.gz) with a very odd performance profile: Interesting case. The issue seems to be that your statistics look like this: select * from pg_stats where tablename = 'topics' and attname = 'archetype'; schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram +---+---+---+---+---++---+-+--+-+---++- public | topics| archetype | f | 0 |12 | 2 | {private_message,regular} | {0.604957,0.395043} | | 0.612985 | || (1 row) That is, archetype consists of 60% 'private_message', 40% 'regular', and absolutely nothing else. So the condition archetype = 'banner' is very selective, and a plan built knowing that that is the parameter value will use the index: # explain select * from topics where archetype = 'banner' limit 1; QUERY PLAN -- Limit (cost=0.29..651.49 rows=1 width=520) - Index Scan using idx11 on topics (cost=0.29..651.49 rows=1 width=520) Index Cond: ((archetype)::text = 'banner'::text) (3 rows) However, that's still a pretty darn expensive indexscan, mainly because archetype is not the leading key ... if you care about the performance of this query, why don't you have an index to match? # create index on topics(archetype); CREATE INDEX # explain select * from topics where archetype = 'banner' limit 1; QUERY PLAN --- Limit (cost=0.29..6.80 rows=1 width=520) - Index Scan using topics_archetype_idx on topics (cost=0.29..6.80 rows=1 width=520) Index Cond: ((archetype)::text = 'banner'::text) (3 rows) However, just fixing the index availability actually makes the performance ratio even worse, because the prepared query still doesn't use the index: # explain execute foo('banner'); QUERY PLAN - Limit (cost=0.00..0.11 rows=1 width=520) - Seq Scan on topics (cost=0.00..1158.19 rows=10088 width=520) Filter: ((archetype)::text = $1) (3 rows) (Yes, you can get this result in psql, you just need to repeat the EXECUTE half a dozen times until it shifts to a generic plan.) The problem here is that without knowledge of the comparison value, the planner assumes that it will probably be one of the two values that make up the table content. (After all, why would you query for something else?) On that basis, a seqscan will probably hit a matching row in no time, and so (with the LIMIT 1) it looks like a better deal than the indexscan. We've talked about this type of problem before. Part of the issue is that costing of LIMIT doesn't apply any penalty for a bad worst-case scenario, and part of it is that the heuristics for choosing between custom and generic plans don't consider the possibility that the generic plan's estimated cost is way wrong for lack of knowledge of the comparison value. It's not real obvious how to improve either heuristic without probably making some cases worse. One thing that occurs to me is that if the generic plan estimate comes out much cheaper than the custom one, maybe we should assume that the generic's cost estimate is bogus. Right offhand I can't think of a reason for a custom plan to look worse than a generic one, unless there's a statistical quirk like this one. In the meantime, I assume that your real data contains a small percentage of values other than these two? If so, maybe cranking up the statistics target would help. If the planner knows that there are more than two values in the column, I think it would be less optimistic about assuming that the comparison value is one of the big two. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers