Hello Scott Hess, >I've lined up some time to work on fts, again, which means fts3. One >thing I'd like to include would be to order doclists by some baked-in >ranking. The idea is to sort to most important items to the front of >the list, and then you can do queries which limit the number of hits >and can thus be significantly faster for popular terms. [Note that >"limit the number of hits" cannot currently be done at the fts layer, >but I'm thinking on that problem, too.]
Maybe I am missing something here, but I can already rank and limit a FTS2 search with the help of a simple join: create table if not exists r (fts_id, rank); create virtual table fts using fts2 (text); insert into fts (text) values ('abc 1'); insert into r values (last_insert_rowid(), 1); insert into fts (text) values ('abc 2'); insert into r values (last_insert_rowid(), 2); insert into fts (text) values ('abc 3'); insert into r values (last_insert_rowid(), 3); select text from fts, r where +fts.rowid = r.fts_id and text match 'abc' order by rank desc limit 2; This query works well, even if the '+' prefixing the RowID is a little awkward. However, it is necessary to avoid an SQLite error, which I do not know if it is rooted in the virtual table or FTS implementation. I would certainly appreciate if FTS queries could be freely joined with other tables without adding '+' prefixes. I believe that bringing FTS closer to full SQL integration will, in the end, add far more possibilities than just adding a single RANK column. Talking about ranking, I would really be pleased to see, instead of a baked-in value, a flexible ranking system based on the frequency and position of matches in the text (similar to what search engines do, even if based just on a single document). offsets() function, IMO, asks for unnecessary work on the application side. It currently returns offsets as text decimals which must be fed to a text parser for analysis. Would it not be easier (and faster!?) on both sides (generating and extraction) if the offsets are passed as a blob of an array of integers instead? Example: int Start_1 int Length_1 int Start_2 int Length_2 etc. Applications could then quickly retrieve the number of matches (length of blob divided by 8) and access individual matches without parsing text. Please understand the above as suggestions and not as criticism. FTS2 is an excellent module, and I am exciting about your commitment to make it even better! Ralf ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------