On 03/23/2016 10:48 AM, ????? wrote:
> I want to report a bug.
>
>
> I write a tokenizer named thai?which is working according to the rule of thai
> vowel,not by space.
>
>
> I build a table using fts5,like this,
> CREATE VIRTUAL TABLE tbl_tha using fts5( key1, key2,TOKENIZE="thai");
>
>
> and then insert a record:
> insert into tbl_tha values('??????','??????');
>
>
> querying like this:
>
>
> SQL1:select * from tbl_tha where tbl_tha match '????????????';
> SQL2:select * from tbl_tha where tbl_tha match '?????? ??????';
>
>
> SQL2 can query the result,but SQL1 return null;
>
>
> I have confirmed that,the tokenize can split ???????????? correctly to
> ??? ??? ???? ??
>
>
> Is that a bug which can not query multi column?
I'm not 100% sure, but I don't think so. Fts5 parses query expressions
according to the rules described here:
https://www.sqlite.org/fts5.html#section_3
It can be complicated, but for queries like the above, it comes down to
"split the input on whitespace". Once it is split, each component of the
query is passed to the tokenizer. If the tokenizer returns more than one
token, then these are handled in the same way as a phrase expression by
fts5. So SQL1 is equivalent to:
... MATCH "??? + ??? + ???? + ??"
whereas SQL2 is:
... MATCH "(??? + ???) AND (???? + ??)"
Maybe there should be an option for languages like Thai to tell FTS5 to
handle this kind of thing as:
... MATCH "??? AND ??? AND ???? AND ??"
Dan.