Hello Scott Hess,

>In the interests of not committing something that people won't like, my 
>current proposal would be to add an implicit TOKENIZER column, which will 
>override the table's default tokenizer for that row. 

There are a few things I am worried about with this approach:

1. FTS storage size

Will the TOKENIZER column not add to the overall size of the FTS storage, even 
if the default tokenizer is used? As FTS requires to store all text, its 
storage requirements are quite high already and did put people of SQLite as 
their full text search implementation.

2. Potential incompatability with query parser tokenizer

The table's text tokenizer is used to tokenize the query string as well. AFAIK, 
both must be identical. I can not see how this single query tokenizer can then 
cooperate with a potentially unlmited number of incompatible row tokenizers. 
Reparsing the query for each row is, it guess, out of the question for 
performance reasons.

* Alternative suggestion

Offer a per COLUMN tokenizer option instead of a per ROW one. This would get 
rid of problem 1 because the tokenizer can be stored with the column definition.

The COLUMN tokenizer option would also help with problem 2: The engine can then 
parse the query according to each column's tokenizer setting. Not all queries 
might make sense with all columns, but at least the engine would guarantee that 
both are using the identical tokenizer. It would be up to the application to 
search certain columns for a particular language query only.

I also find the per column tokenizer override easier to grasp (like for 
translations, for that purpose), because one can different language columns 
with different tokenizers: Content_EN, Content_KR, and so on. This of course 
assumes that the number of supported languages is limited. New languages can be 
added with ALTER TABLE, but an application with support for an infinite number 
of langages would probably opt for the one table per language option.

Ralf 


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to