I wonder if there is some effort already under way to allow custom tokenizers 
by SQLite's full text search?

I know that custom tokenizers are already on the developer's todo-list, but I 
would be interested to know if some progess has already been made.

Custom tokenizers would be able solve a couple of the current limitations to 
FTS:

* Caseless searching for full Unicode range or characters (currently limited to 
ASCII only).

* Stop Words - the tokenizer would ignore them and the FTS engine could remain 
unchanged.

* Improve reading of meta text formats (OpenOffice, Word, HTML). I can imagine 
that SQLite would quickly see a bunch of user contributed tokenizers for these 
formats.

Not directly related to tokenizers:

I am very interested to know if it would be possible to use an FTS indexing 
module to store the inverted index only, but not the document's text. This 
would safe disk space if the text to index is stored on disk rather than inside 
the database.

Ralf 


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to