Hi,

I downloaded the amalgamation sources in order to create a build of sqlite
with FTS3 enabled. The problem for me is that the default "simple" tokenizer
is not behaving precisely how I want. In fact, I'd prefer if it wouldn't
count punctuation as a delimeter, and stuck purely to whitespace.

In the simpleCreate() function there's some code that initializes an array
that records with characters are delimiters or not:

for(i=1; i<0x80; i++){  
    t->delim[i] = !isalnum(i);
}

I thought that if I made a simple edit to use the isspace() function then
I'd achieve what I was after, i.e.,

for(i=1; i<0x80; i++){  
    t->delim[i] = isspace(i);
}

However, when I build this version, create my fts virtual tables and then
query them I get zero results. When I revert back to !isalnum I get results,
but as I'm seeing words that are being split where I don't want them to be.

I must admit my C experience isn't great, but I've been trying for far too
many hours now with little gain. I'd really appreciate some pointers!

Thanks in advance,
Andy
-- 
View this message in context: 
http://www.nabble.com/Simple-Tokenizer-in-FTS3-tp22911635p22911635.html
Sent from the SQLite mailing list archive at Nabble.com.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to