On 05/13/2016 09:19 PM, Matt Hamilton wrote:
> Hi all,
> Anyone know if/how you can call the FTS5 tokeniser functions manually?
> e.g. I want to look something up in the fts5vocab table but can't as I need
> to split/stem the initial value first before querying the table?
>
> To illustrate:
>
> sqlite> CREATE VIRTUAL TABLE ft1 USING fts5(x, tokenize = porter);
> sqlite> INSERT INTO ft1 VALUES('running man');
> sqlite> CREATE VIRTUAL TABLE ft1_v_row USING fts5vocab(ft1, row);
> sqlite> SELECT * FROM ft1_v_row;
> man|1|1
> run|1|1
> sqlite> SELECT count(*) FROM ft1_v_row WHERE term = 'running';
> 0
> sqlite>
>
> How can I somehow map 'running' => 'run' in order to query the fts5vocab
> table to get stats on that term? And how could I tokenise 'running man' =>
> 'run', 'man' in order to look up multiple tokens?
I think the only way to do that at the moment is from C code using the
API in fts5.h:
https://www.sqlite.org/fts5.html#section_7
Use xFindTokenizer() to grab a handle for the desired tokenizer module,
then xCreate to create an instance and xTokenize to tokenize text.
There is example code in the fts5_test_tok.c file:
http://sqlite.org/src/artifact/db08af63673c3a7d
The example code creates a virtual table module that looks useful enough:
CREATE VIRTUAL TABLE ttt USING fts5tokenize('porter');
then:
SELECT * FROM ft1_v_row WHERE term IN (SELECT token FROM ttt('running
man'));
should probably work. More information in fts5_test_tok.c.
Dan.