On 03/23/2016 10:48 AM, ????? wrote: > I want to report a bug. > > > I write a tokenizer named thai?which is working according to the rule of thai > vowel,not by space. > > > I build a table using fts5,like this, > CREATE VIRTUAL TABLE tbl_tha using fts5( key1, key2,TOKENIZE="thai"); > > > and then insert a record: > insert into tbl_tha values('??????','??????'); > > > querying like this: > > > SQL1:select * from tbl_tha where tbl_tha match '????????????'; > SQL2:select * from tbl_tha where tbl_tha match '?????? ??????'; > > > SQL2 can query the result,but SQL1 return null; > > > I have confirmed that,the tokenize can split ???????????? correctly to > ??? ??? ???? ?? > > > Is that a bug which can not query multi column?
I'm not 100% sure, but I don't think so. Fts5 parses query expressions according to the rules described here: https://www.sqlite.org/fts5.html#section_3 It can be complicated, but for queries like the above, it comes down to "split the input on whitespace". Once it is split, each component of the query is passed to the tokenizer. If the tokenizer returns more than one token, then these are handled in the same way as a phrase expression by fts5. So SQL1 is equivalent to: ... MATCH "??? + ??? + ???? + ??" whereas SQL2 is: ... MATCH "(??? + ???) AND (???? + ??)" Maybe there should be an option for languages like Thai to tell FTS5 to handle this kind of thing as: ... MATCH "??? AND ??? AND ???? AND ??" Dan.