Hello !
I'm looking in the documentation and it doesn't seem to mention any
option to specify a minimum number of characters to index, looking at
some fts5 tables it seems that an option to limit the minimum number of
characters to at least 2 or 3 would be a good shot as stopwords, another
interest option would be a regex like black/white list of sequence of
characters to be indexed.
Something like:
create virtual table if not exists pdfs_fts using fts5(pdf_name
UNINDEXED, data,
tokenize = 'unicode61 remove_diacritics 1 min_word_size 3
word_black_list [\d\.\d\d\w \a\d\d\d] word_white_list [\(\d+\)
\d\d\.\d\d\d\.\d\d\a]');
The idea is to allow/disallow some specific domain sequences to be
included/excluded from indexing.
Any idea on how to obtain that ?
Cheers !
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users