On Mon, Mar 19, 2012 at 6:19 PM, Jeff Davis <pg...@j-davis.com> wrote: > On Sat, 2012-03-17 at 12:48 +0000, Simon Riggs wrote: >> The problems are as I described them >> >> (1) no account made for sparsity, and other factors leading to an >> overestimate of rows (N) >> >> (2) inappropriate assumption of the effect of LIMIT m, which causes a >> costly SeqScan to appear better than an IndexScan for low m/N, when in >> fact that is seldom the case. >> >> Overestimating N in (1) inverts the problem, so that an overestimate >> isn't the safe thing at all. > > I think the actual problem has more to do with risk. The planner doesn't > know how uniform the distribution of the table is, which introduces risk > for the table scan. > > I would tend to agree that for low selectivity fraction and a very low > limit (e.g. 1-3 in your example) and a large table, it doesn't seem like > a good risk to use a table scan. I don't know how that should be modeled > or implemented though.
FWIW, we have been bitten by the uniformity assumption a number of times, but we kind of know what to look for. It also has an interesting interpretation as applied to garbage (insomuch as queries all carry a conceptual "WHERE xmin...xmax... predicate). Sometimes a table is mostly garbage for a particular value, and the index scan constantly has to seek the next tuple in its search to find non-garbage. I don't know how many times the uniformity assumption has held up fine (guess: many), but cases of where an index cond has that little "Filter:" footnote has been a ticking timebomb for a number of people I've interacted with. It can be subtle because the amount of scanned data may not be enormous, so upon re-running the query it is not nearly as pathological as the original run given cache effects, the only way to make it very visible is to look at the EXPLAIN with BUFFERS option and note how many pages are being discarded by the filter/how many pages of the underlying relation and indexes are being processed only to be discarded. Were I writing a OLTP-query performance linting tool, queries with partially-matched index conditions might even be on my list of things to warn about. That is unfortunate for cases where the unmatched part of the index condition may only qualify, say, five records in all situations, but the database has no construct to know and enforce this AFAIK. -- fdr -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers