OK, here's a more detailed description of the FTS selectivity
improvement idea:
=== Write a typanalyze function for column type tsvector
The function would go through the tuples returned by the BlockSampler
and compute the number of times each distinct lexeme appears inside the
Oleg Bartunov wrote:
Jan,
the problem is known and well requested. From your promotion it's not
clear what's an idea ?
Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED]
writes:
2. Implement better selectivity estimates for FTS.
OK, after reading through the some of the
On Sat, 8 Mar 2008, Jan Urbaski wrote:
Oleg Bartunov wrote:
Jan,
the problem is known and well requested. From your promotion it's not
clear what's an idea ?
Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED]
writes:
2. Implement better selectivity estimates for FTS.
OK,
Oleg Bartunov [EMAIL PROTECTED] writes:
On Sat, 8 Mar 2008, Jan Urbaski wrote:
I have a feeling that in many cases identifying the top 50 to 300 lexemes
would be enough to talk about text search selectivity with a degree of
confidence. At least we wouldn't give overly low estimates for
Oleg Bartunov wrote:
On Sat, 8 Mar 2008, Jan Urbaski wrote:
OK, after reading through the some of the code the idea is to write a
custom typanalyze function for tsvector columns. It could look inside
such function already exists, it's ts_stat(). The problem with ts_stat() is
its
On Sat, 8 Mar 2008, Tom Lane wrote:
Oleg Bartunov [EMAIL PROTECTED] writes:
On Sat, 8 Mar 2008, Jan Urbaski wrote:
I have a feeling that in many cases identifying the top 50 to 300 lexemes
would be enough to talk about text search selectivity with a degree of
confidence. At least we wouldn't
On Sat, 8 Mar 2008, Jan Urbaski wrote:
Unfortunately, selectivity estimation for query is much difficult than just
estimate frequency of individual word.
Sure, given something like 'cats dogs'::tsquery the frequency of 'cat' and
'dog' won't suffice. But at least it's a starting point and
Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes:
2. Implement better selectivity estimates for FTS.
+1 for that one ...
OK, this one might very well be the one that'd be more useful. And I can
always reuse the other idea for my thesis, after expanding it a bit.
On Tue, Mar 4, 2008 at 4:47 PM, Jan Urbański
[EMAIL PROTECTED] wrote:
Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes:
2. Implement better selectivity estimates for FTS.
+1 for that one ...
OK, this one might very well be the one that'd be more useful. And
Jan,
OK, this one might very well be the one that'd be more useful.
Well, you should submit *both* once SoC opens for applications. The mentors
will decide which.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make
Jan,
the problem is known and well requested. From your promotion it's not
clear what's an idea ?
Oleg
On Tue, 4 Mar 2008, Jan Urbaski wrote:
Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes:
2. Implement better selectivity estimates for FTS.
+1 for that one ...
Oleg Bartunov wrote:
Jan,
the problem is known and well requested. From your promotion it's not
clear what's an idea ?
I guess the first approach could be to populate some more columns in
pg_statistics for tables with tsvectors. I see there are some statistics
already being gathered
Hi PostgreSQL!
Although this year's GSoC is just starting, I thought getting in touch a bit
earlier would only be of benefit.
I study Computer Science in Faculty of Mathematics, Informatics
and Mechanics of Warsaw University. I'm currently in my fourth year of
studies. Having chosen Databases
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes:
2. Implement better selectivity estimates for FTS.
+1 for that one ...
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your Subscription:
14 matches
Mail list logo