Re: [sqlite] Ranking in fts.

Ralf Junker Sun, 10 Jun 2007 06:38:21 -0700

Hello Scott Hess,

>I've lined up some time to work on fts, again, which means fts3.  One
>thing I'd like to include would be to order doclists by some baked-in
>ranking.  The idea is to sort to most important items to the front of
>the list, and then you can do queries which limit the number of hits
>and can thus be significantly faster for popular terms.  [Note that
>"limit the number of hits" cannot currently be done at the fts layer,
>but I'm thinking on that problem, too.]


Maybe I am missing something here, but I can already rank and limit a FTS2 
search with the help of a simple join:

create table if not exists r (fts_id, rank);
create virtual table fts using fts2 (text);

insert into fts (text) values ('abc 1');
insert into r values (last_insert_rowid(), 1);

insert into fts (text) values ('abc 2');
insert into r values (last_insert_rowid(), 2);

insert into fts (text) values ('abc 3');
insert into r values (last_insert_rowid(), 3);

select text from fts, r 
  where +fts.rowid = r.fts_id and text match 'abc' 
  order by rank desc
  limit 2;

This query works well, even if the '+' prefixing the RowID is a little awkward. 
However, it is necessary to avoid an SQLite error, which I do not know if it is 
rooted in the virtual table or FTS implementation. I would certainly appreciate 
if FTS queries could be freely joined with other tables without adding '+' 
prefixes. I believe that bringing FTS closer to full SQL integration will, in 
the end, add far more possibilities than just adding a single RANK column.

Talking about ranking, I would really be pleased to see, instead of a baked-in 
value, a flexible ranking system based on the frequency and position of matches 
in the text (similar to what search engines do, even if based just on a single 
document).

offsets() function, IMO, asks for unnecessary work on the application side. It 
currently returns offsets as text decimals which must be fed to a text parser 
for analysis. Would it not be easier (and faster!?) on both sides (generating 
and extraction) if the offsets are passed as a blob of an array of integers 
instead? Example:

 int Start_1
 int Length_1
 int Start_2
 int Length_2
 etc.

Applications could then quickly retrieve the number of matches (length of blob 
divided by 8) and access individual matches without parsing text.

Please understand the above as suggestions and not as criticism. FTS2 is an 
excellent module, and I am exciting about your commitment to make it even 
better!

Ralf 


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] Ranking in fts.

Reply via email to