On 03/23/2016 10:48 AM, ????? wrote:
> I want to report a bug.
>
>
> I write a tokenizer named thai?which is working according to the rule of thai 
> vowel,not by space.
>
>
> I build a table using fts5,like this,
> CREATE VIRTUAL TABLE tbl_tha using fts5( key1, key2,TOKENIZE="thai");
>
>
> and then insert a record:
> insert into tbl_tha  values('??????','??????');
>
>
> querying like this:
>
>
> SQL1:select * from tbl_tha   where tbl_tha  match '????????????';
> SQL2:select * from tbl_tha   where tbl_tha  match '?????? ??????';
>
>
> SQL2 can query the result,but SQL1 return null;
>
>
> I have confirmed that,the tokenize can split ???????????? correctly to
> ??? ??? ???? ??
>
>
> Is that a bug which can not query multi column?

I'm not 100% sure, but I don't think so. Fts5 parses query expressions 
according to the rules described here:

   https://www.sqlite.org/fts5.html#section_3

It can be complicated, but for queries like the above, it comes down to 
"split the input on whitespace". Once it is split, each component of the 
query is passed to the tokenizer. If the tokenizer returns more than one 
token, then these are handled in the same way as a phrase expression by 
fts5. So SQL1 is equivalent to:

  ... MATCH "??? + ??? + ???? + ??"


whereas SQL2 is:

  ... MATCH "(??? + ???) AND (???? + ??)"

Maybe there should be an option for languages like Thai to tell FTS5 to 
handle this kind of thing as:

  ... MATCH "??? AND ??? AND ???? AND ??"

Dan.






Reply via email to