hi,

On Sunday 04 December 2011 14:23:09 Black, Michael (IS) wrote:
> It says "here's token 'hal'" and if you return the pointer to "h" it points
> to the same place so it returns "hal" right back to you....ergo the loop.
I have read through the ext/fts3/fts3/expr.c code and found out the following: 
piEndOffset must point to the zero byte after the returned token. fts3 expects 
the tokenizer to generate exactly one token for each search string.

The first call to my xNext always returned the prefix with length 1 and 
piStartOffset=piEndOffset=0. Therefore fts3 incremented its internal pointer 
by 0 after each loop and then called xNext on the same string again.

I fixed this by returning first the longest prefix (the given word itself) and 
pointing piEndOffset after the returned string. Now it works.

> You don't say why you're doing this.  FTS already supports prefix queries.
The fts documentation states, that if I want to efficently search for prefixes 
I should give the maximum size of such prefixes such that fts can optimize for 
those prefixes. I want to efficently search for prefixes of any length.

The drawback of my tokenizer is, that it consumes a lot of space, for 56Mb of 
strings I get a 1.2Gb file. I assume since everything is done in trees, a 
search with my tokenizer is in O(log(n)) where n is the number of tokens in 
the table. Is this still O(log(n)) if I write a tokenizer for which 
input=output and use the fts prefix search?

Greetings johannes
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to