Hello since it this bug report (+ a dirty-fix) it might be useful for
both users and devs.
that's why I send a copy to both mailing lists! 
I hope I don't bother the diligent devs who read all of both list, sry
to them, and thx for sqlite btw. ;)!

recently I wanted to use the snippet function in sqlite for my small
sqlite dictionary (running on android but the bug occurs also on my
linux desktop).

but it behaved strangely when my entry started with "non-words"
character(s) (not alphanumeric and all Unicode (or chars>128) in short
simple tokenizer delimiters)

the snippet never prints them if they are in the beginning  of the first
word
here an examples to demonstrate:

EXAMPLE SETUP SQL:
create table testdata (german);
create virtual table test using fts4(content="testdata",german);

insert into testdata(german) VALUES ("[1] a b c");
insert into test(docid,german) VALUES(1,"[1] a b c ");

insert into testdata(german) VALUES ("[{[_.,:;[1] a b c");
insert into test(docid,german) VALUES(2,"[{[_.,:;[1] a b c "); 

insert into testdata(german) VALUES ("1[1] a b c");
insert into test(docid,german) VALUES(3,"1[1] a b c "); 

insert into testdata(german) VALUES ("[1] a b c");
insert into test(docid,german) VALUES(4,"1[1] a b c "); 

insert into testdata(german) 
VALUES(char(8203,91,49,93,32,97,32,98,32,99));
insert into test(docid,german)
VALUES(5,char(8203,91,49,93,32,97,32,98,32,99));


and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25)
I get following for a select with snippet:

EXAMPLE OUTPUT:
sqlite> select docid,*,snippet(test) from test where german match "a";
1|[1] a b c|1] <b>a</b> b c
2|[{[_.,:;[1] a b c|1] <b>a</b> b c
3|1[1] a b c|1[1] <b>a</b> b c
4|[1] a b c|1] a <b>b</b> c
5|​[1] a b c|​[1] <b>a</b> b c



-As you can see for id 1 and 2 <b> is at the right position
but all beginning non-alphanumerical [,{, etc. are just left out in the
snippet.

-ID 3 works but has an additional 1 that should not be there so no
solution...

-ID 4 does not help and breaks the offsets so even worse....

-ID 5 works!!!!.... BUT this is a dirty fix i found.
it adds an Unicode character ('ZERO WIDTH SPACE' (U+200B)) in front
which obviously cant be seen and doesnt "break" the offsets (just shifts
them all +1)
I didn't test it yet on android but I hope so, since it supports
Unicode ... 
obviously this is not a nice solution or one for more simpler/embedded
systems.


(btw. the same bug occurs also with fts3 and also with no special
content option)
here a small example for normal fts4 with a more custom snippet call:

create virtual table test using fts4(german);
insert into test VALUES("[1] a b c");

sqlite> select *,snippet(test,"#","#","...",0,64) from test where german
match 'a';
[1] a b c|1] #a# b c



regards boscowitch

PS: please excuse the "german" ;) and all English spelling errors 

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to