Hello since it this bug report (+ a dirty-fix) it might be useful for both users and devs. that's why I send a copy to both mailing lists! I hope I don't bother the diligent devs who read all of both list, sry to them, and thx for sqlite btw. ;)!
recently I wanted to use the snippet function in sqlite for my small sqlite dictionary (running on android but the bug occurs also on my linux desktop). but it behaved strangely when my entry started with "non-words" character(s) (not alphanumeric and all Unicode (or chars>128) in short simple tokenizer delimiters) the snippet never prints them if they are in the beginning of the first word here an examples to demonstrate: EXAMPLE SETUP SQL: create table testdata (german); create virtual table test using fts4(content="testdata",german); insert into testdata(german) VALUES ("[1] a b c"); insert into test(docid,german) VALUES(1,"[1] a b c "); insert into testdata(german) VALUES ("[{[_.,:;[1] a b c"); insert into test(docid,german) VALUES(2,"[{[_.,:;[1] a b c "); insert into testdata(german) VALUES ("1[1] a b c"); insert into test(docid,german) VALUES(3,"1[1] a b c "); insert into testdata(german) VALUES ("[1] a b c"); insert into test(docid,german) VALUES(4,"1[1] a b c "); insert into testdata(german) VALUES(char(8203,91,49,93,32,97,32,98,32,99)); insert into test(docid,german) VALUES(5,char(8203,91,49,93,32,97,32,98,32,99)); and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25) I get following for a select with snippet: EXAMPLE OUTPUT: sqlite> select docid,*,snippet(test) from test where german match "a"; 1|[1] a b c|1] <b>a</b> b c 2|[{[_.,:;[1] a b c|1] <b>a</b> b c 3|1[1] a b c|1[1] <b>a</b> b c 4|[1] a b c|1] a <b>b</b> c 5|[1] a b c|[1] <b>a</b> b c -As you can see for id 1 and 2 <b> is at the right position but all beginning non-alphanumerical [,{, etc. are just left out in the snippet. -ID 3 works but has an additional 1 that should not be there so no solution... -ID 4 does not help and breaks the offsets so even worse.... -ID 5 works!!!!.... BUT this is a dirty fix i found. it adds an Unicode character ('ZERO WIDTH SPACE' (U+200B)) in front which obviously cant be seen and doesnt "break" the offsets (just shifts them all +1) I didn't test it yet on android but I hope so, since it supports Unicode ... obviously this is not a nice solution or one for more simpler/embedded systems. (btw. the same bug occurs also with fts3 and also with no special content option) here a small example for normal fts4 with a more custom snippet call: create virtual table test using fts4(german); insert into test VALUES("[1] a b c"); sqlite> select *,snippet(test,"#","#","...",0,64) from test where german match 'a'; [1] a b c|1] #a# b c regards boscowitch PS: please excuse the "german" ;) and all English spelling errors _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users