I looked into the problem reported in http://www.postgresql.org/message-id/flat/56830ea5.7080...@squeakycode.net which briefly is that gincostestimate can produce ridicuously large index scan cost estimates for partial-match queries.

It appears that there are two basic problems contributing to this: 1. gincostestimate will scale up the statistics found in the index metapage, no matter what, as long as they're not zero (but it doesn't insist on nEntries being > 0). If the stats date back to the index being empty or nearly so, we can come out with some page counts that are pretty silly, and we can also come out with numEntries = 0, which we then clamp to 1, but this is still orders of magnitude off of reality. 2. The partial-match logic in gincost_pattern is pretty bogus and can produce partialEntries values that are not very realistic. The direct cause of the bad cost estimate is that some of the cost factors get multiplied by partialEntries / numEntries, which can be quite a lot larger than one, a scaling that was certainly never intended. The attached proposed patch deals with this first by not applying the scaling logic if it would be scaling up the index size more than 4X; if it would be, we fall back to the rule-of-thumb estimates I introduced recently in commit 7fb008c5ee59b040. Secondly, we clamp the partialEntries / numEntries ratio to not more than 1. This is obviously all a bit ad-hoc, but it seems less likely to produce insane estimates than what's there now. On the other hand, those rule-of-thumb estimates are too new to have any field experience behind them, so maybe doubling down on our dependence on them isn't bright. Comments? regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 15121bc..8d109aa 100644 *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *************** gincostestimate(PG_FUNCTION_ARGS) *** 7246,7251 **** --- 7246,7252 ---- numEntries; GinQualCounts counts; bool matchPossible; + double partialScale; double entryPagesFetched, dataPagesFetched, dataPagesFetchedBySel; *************** gincostestimate(PG_FUNCTION_ARGS) *** 7274,7315 **** memset(&ginStats, 0, sizeof(ginStats)); } ! if (ginStats.nTotalPages > 0 && ginStats.nEntryPages > 0 && numPages > 0) { /* ! * We got valid stats. nPendingPages can be trusted, but the other ! * fields are data as of the last VACUUM. Scale them by the ratio ! * numPages / nTotalPages to account for growth since then. */ double scale = numPages / ginStats.nTotalPages; ! numEntryPages = ginStats.nEntryPages; ! numDataPages = ginStats.nDataPages; ! numPendingPages = ginStats.nPendingPages; ! numEntries = ginStats.nEntries; ! ! numEntryPages = ceil(numEntryPages * scale); ! numDataPages = ceil(numDataPages * scale); ! numEntries = ceil(numEntries * scale); /* ensure we didn't round up too much */ ! numEntryPages = Min(numEntryPages, numPages); ! numDataPages = Min(numDataPages, numPages - numEntryPages); } else { /* ! * It's a hypothetical index, or perhaps an index created pre-9.1 and ! * never vacuumed since upgrading. Invent some plausible internal ! * statistics based on the index page count. We estimate that 90% of ! * the index is entry pages, and the rest is data pages. Estimate 100 ! * entries per entry page; this is rather bogus since it'll depend on ! * the size of the keys, but it's more robust than trying to predict ! * the number of entries per heap tuple. */ numPages = Max(numPages, 10); ! numEntryPages = floor(numPages * 0.90); ! numDataPages = numPages - numEntryPages; ! numPendingPages = 0; numEntries = floor(numEntryPages * 100); } --- 7275,7333 ---- memset(&ginStats, 0, sizeof(ginStats)); } ! /* ! * Assuming we got valid (nonzero) stats at all, nPendingPages can be ! * trusted, but the other fields are data as of the last VACUUM. We can ! * scale them up to account for growth since then, but that method only ! * goes so far; in the worst case, the stats might be for a completely ! * empty index, and scaling them will produce pretty bogus numbers. ! * Somewhat arbitrarily, set the cutoff for doing scaling at 4X growth; if ! * it's grown more than that, fall back to estimating things only from the ! * assumed-accurate index size. But we'll trust nPendingPages in any case ! * so long as it's not clearly insane, ie, more than the index size. ! */ ! if (ginStats.nPendingPages < numPages) ! numPendingPages = ginStats.nPendingPages; ! else ! numPendingPages = 0; ! ! if (numPages > 0 && ginStats.nTotalPages <= numPages && ! ginStats.nTotalPages > numPages / 4 && ! ginStats.nEntryPages > 0 && ginStats.nEntries > 0) { /* ! * OK, the stats seem close enough to sane to be trusted. But we ! * still need to scale them by the ratio numPages / nTotalPages to ! * account for growth since the last VACUUM. */ double scale = numPages / ginStats.nTotalPages; ! numEntryPages = ceil(ginStats.nEntryPages * scale); ! numDataPages = ceil(ginStats.nDataPages * scale); ! numEntries = ceil(ginStats.nEntries * scale); /* ensure we didn't round up too much */ ! numEntryPages = Min(numEntryPages, numPages - numPendingPages); ! numDataPages = Min(numDataPages, ! numPages - numPendingPages - numEntryPages); } else { /* ! * We might get here because it's a hypothetical index, or an index ! * created pre-9.1 and never vacuumed since upgrading (in which case ! * its stats would read as zeroes), or just because it's grown too ! * much since the last VACUUM for us to put our faith in scaling. ! * ! * Invent some plausible internal statistics based on the index page ! * count (and clamp that to at least 10 pages, just in case). We ! * estimate that 90% of the index is entry pages, and the rest is data ! * pages. Estimate 100 entries per entry page; this is rather bogus ! * since it'll depend on the size of the keys, but it's more robust ! * than trying to predict the number of entries per heap tuple. */ numPages = Max(numPages, 10); ! numEntryPages = floor((numPages - numPendingPages) * 0.90); ! numDataPages = numPages - numPendingPages - numEntryPages; numEntries = floor(numEntryPages * 100); } *************** gincostestimate(PG_FUNCTION_ARGS) *** 7435,7450 **** /* * Add an estimate of entry pages read by partial match algorithm. It's a * scan over leaf pages in entry tree. We haven't any useful stats here, ! * so estimate it as proportion. */ ! entryPagesFetched += ceil(numEntryPages * counts.partialEntries / numEntries); /* * Partial match algorithm reads all data pages before doing actual scan, ! * so it's a startup cost. Again, we haven't any useful stats here, so, ! * estimate it as proportion */ ! dataPagesFetched = ceil(numDataPages * counts.partialEntries / numEntries); /* * Calculate cache effects if more than one scan due to nestloops or array --- 7453,7473 ---- /* * Add an estimate of entry pages read by partial match algorithm. It's a * scan over leaf pages in entry tree. We haven't any useful stats here, ! * so estimate it as proportion. Because counts.partialEntries is really ! * pretty bogus (see code above), it's possible that it is more than ! * numEntries; clamp the proportion to ensure sanity. */ ! partialScale = counts.partialEntries / numEntries; ! partialScale = Min(partialScale, 1.0); ! ! entryPagesFetched += ceil(numEntryPages * partialScale); /* * Partial match algorithm reads all data pages before doing actual scan, ! * so it's a startup cost. Again, we haven't any useful stats here, so ! * estimate it as proportion. */ ! dataPagesFetched = ceil(numDataPages * partialScale); /* * Calculate cache effects if more than one scan due to nestloops or array

-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers