Since performance is going to be dependent on your data distribution why don't you just try the default way and see what happens to your query performance?
Also take a look at adding prefix=1,prefix=2,prefix=3, etc to an FTS4 table to add those indexes. That's probably a lot more space efiicient and faster for queries too since it's more compact. Then let the rest of us know what the performance difference is. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems ________________________________ From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Johannes Krude [johan...@krude.de] Sent: Sunday, December 04, 2011 8:46 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] RE Infinite Loop in MATCH on self written fts3 tokenizer hi, On Sunday 04 December 2011 14:23:09 Black, Michael (IS) wrote: > It says "here's token 'hal'" and if you return the pointer to "h" it points > to the same place so it returns "hal" right back to you....ergo the loop. I have read through the ext/fts3/fts3/expr.c code and found out the following: piEndOffset must point to the zero byte after the returned token. fts3 expects the tokenizer to generate exactly one token for each search string. The first call to my xNext always returned the prefix with length 1 and piStartOffset=piEndOffset=0. Therefore fts3 incremented its internal pointer by 0 after each loop and then called xNext on the same string again. I fixed this by returning first the longest prefix (the given word itself) and pointing piEndOffset after the returned string. Now it works. > You don't say why you're doing this. FTS already supports prefix queries. The fts documentation states, that if I want to efficently search for prefixes I should give the maximum size of such prefixes such that fts can optimize for those prefixes. I want to efficently search for prefixes of any length. The drawback of my tokenizer is, that it consumes a lot of space, for 56Mb of strings I get a 1.2Gb file. I assume since everything is done in trees, a search with my tokenizer is in O(log(n)) where n is the number of tokens in the table. Is this still O(log(n)) if I write a tokenizer for which input=output and use the fts prefix search? Greetings johannes _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users