Gregory Stark wrote:
"Richard Huxton" <[EMAIL PROTECTED]> writes:
Now you and I can look at a substring and probably make a good guess how common
it is (assuming we know the targets are British surnames or Japanese towns). PG
needs one number - or rather, it picks one number for each length of
search-string (afaik).
I don't think that's true. Postgres calculates the lower and upper bound
implied by the search pattern and then uses the histogram to estimate how
selective that range is. It's sometimes surprisingly good but obviously it's
not perfect.
Sorry - I'm obviously picking my words badly today.
I meant for the "contains" substring match. It gives different (goes
away and checks...yes) predictions based on string length. So it guesses
that LIKE '%aaa%' will match more than LIKE '%aaaa%'. Of course, if we
were matching surnames you and I could say that this is very unlikely,
but without some big statistics table I guess there's not much more PG
can do.
For a trailing wildcard LIKE 'aaa%' it can and does as you say convert
this into something along the lines of (>= 'aaa' AND < 'aab'). Although
IIRC that depends if your locale allows such (not sure, I don't really
use non-C/non-English locales enough).
--
Richard Huxton
Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster