Peter complained awhile back about unrealistically small selectivity estimates for LIKE with a fixed-prefix pattern: http://archives.postgresql.org/pgsql-hackers/2007-12/msg00939.php I had suspected that this was related to the locale-specific problems we fixed a few months ago, but having finally had a chance to look at the data off-list, there doesn't seem to be any such component to the problem. What it boils down to is that when we generate a range constraint based on the prefix, such as col >= 'prefix' AND col < 'prefiy' if the prefix is more than a few characters long then the two endpoint values are indistinguishable as far as comparison to the histogram is concerned, and so we come out with a selectivity estimate that is zero to within roundoff error. This is unreasonably optimistic and can lead to bad plan choices.
What I propose doing about this is a small variant on Peter's original suggestion: compute the estimated selectivity for col = 'prefix' and clamp the result of prefix_selectivity to be at least that. This is plausible on intuitive grounds since the range constraint must surely include at least these values. Furthermore, it eliminates what had been an entirely ad-hoc choice of a lower bound (the code was clamping to at least 1e-10, which is surely unreasonably optimistic). The end result of this, for the case Peter is interested in where there are no especially common values, is that the minimum selectivity estimate for LIKE 'prefix%' will be essentially 1/ndistinct where ndistinct is the estimated number of distinct values in the column, because that's what eqsel() does in such cases. I attach a proposed patch against HEAD for this. It's a bit long but most of the bulk is refactoring eqsel() to make it easy to use from prefix_selectivity(). The intellectual content is just in the last diff hunk. To help out Peter's client, who's running 8.1.x, we'd have to backpatch at least that far. We backpatched the last round of LIKE selectivity fixes to 8.1, so I don't have too much hesitation about doing the same here. Comments? regards, tom lane
binResgmyMI0F.bin
Description: like_minimum_estimate.patch.gz
-- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches