Hello Scott Hess,
>I've lined up some time to work on fts, again, which means fts3. One
>thing I'd like to include would be to order doclists by some baked-in
>ranking. The idea is to sort to most important items to the front of
>the list, and then you can do queries which limit the number of hits
>and can thus be significantly faster for popular terms. [Note that
>"limit the number of hits" cannot currently be done at the fts layer,
>but I'm thinking on that problem, too.]
Maybe I am missing something here, but I can already rank and limit a FTS2
search with the help of a simple join:
create table if not exists r (fts_id, rank);
create virtual table fts using fts2 (text);
insert into fts (text) values ('abc 1');
insert into r values (last_insert_rowid(), 1);
insert into fts (text) values ('abc 2');
insert into r values (last_insert_rowid(), 2);
insert into fts (text) values ('abc 3');
insert into r values (last_insert_rowid(), 3);
select text from fts, r
where +fts.rowid = r.fts_id and text match 'abc'
order by rank desc
limit 2;
This query works well, even if the '+' prefixing the RowID is a little awkward.
However, it is necessary to avoid an SQLite error, which I do not know if it is
rooted in the virtual table or FTS implementation. I would certainly appreciate
if FTS queries could be freely joined with other tables without adding '+'
prefixes. I believe that bringing FTS closer to full SQL integration will, in
the end, add far more possibilities than just adding a single RANK column.
Talking about ranking, I would really be pleased to see, instead of a baked-in
value, a flexible ranking system based on the frequency and position of matches
in the text (similar to what search engines do, even if based just on a single
document).
offsets() function, IMO, asks for unnecessary work on the application side. It
currently returns offsets as text decimals which must be fed to a text parser
for analysis. Would it not be easier (and faster!?) on both sides (generating
and extraction) if the offsets are passed as a blob of an array of integers
instead? Example:
int Start_1
int Length_1
int Start_2
int Length_2
etc.
Applications could then quickly retrieve the number of matches (length of blob
divided by 8) and access individual matches without parsing text.
Please understand the above as suggestions and not as criticism. FTS2 is an
excellent module, and I am exciting about your commitment to make it even
better!
Ralf
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------