[sqlite] Some FTS5 guidance

Mario M. Westphal Thu, 7 Jan 2016 19:31:07 +0100

Hello,



I recently looked into FTS 5. 

The documentation is clear and I was able to get it running with a small
test database quickly. And the response times are awesome :-)



My question: 



At least as I understand it at this point, FTS can only do prefix queries.



If my database contains the words



moon

moonlight

moonshine

shine

sunshine



A FTS query like "moon*" will find all three terms starting with "moon" -
very fast.



But there is no way to find "moonshine" or "sunshine" by running a query for
"shine" or "shine*" ?



Currently I search using LIKE and there such 'contains' queries are easy. My
users of course don't understand all this and want to find all words
containing shine, wherever the term appears in the word.



The only idea I had so far was to write my own tokenizer and to store each
word with every possible 'sub-word':



When "moonshine" is added to FTS, it is split into multiple words:



moonshine
oonshine
onshine
nshine
shine
. 



(maybe I limit this to a minimum of 2 or 3 characters).



This of course produces a log of extra entries in FTS and may impact
performance and database size. 

I hence wonder if this problem has been tackled already and if there is a
"standard" solution.

[sqlite] Some FTS5 guidance

Reply via email to