Re: [HACKERS] Google Summer of Code 2008

Jan Urbański Sat, 08 Mar 2008 11:00:24 -0800

Oleg Bartunov wrote:

Jan,


the problem is known and well requested. From your promotion it's not
clear what's an idea ?

Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <[EMAIL PROTECTED]> writes:
2. Implement better selectivity estimates for FTS.

OK, after reading through the some of the code the idea is to write a custom typanalyze function for tsvector columns. It could look inside the tsvectors, compute the most commonly appearing lexemes and store that information in pg_statistics. Then there should be a custom selectivity function for @@ and friends, that would look at the lexemes in pg_statistics, see if the tsquery it got matches some/any of them and return a result based on that.

I have a feeling that in many cases identifying the top 50 to 300 lexemes would be enough to talk about text search selectivity with a degree of confidence. At least we wouldn't give overly low estimates for queries looking for very popular words, which I believe is worse than givng an overly high estimate for a obscure query (am I wrong here?).


Regards,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Google Summer of Code 2008

Reply via email to