Hello Scott Hess, >In the interests of not committing something that people won't like, my >current proposal would be to add an implicit TOKENIZER column, which will >override the table's default tokenizer for that row.
There are a few things I am worried about with this approach: 1. FTS storage size Will the TOKENIZER column not add to the overall size of the FTS storage, even if the default tokenizer is used? As FTS requires to store all text, its storage requirements are quite high already and did put people of SQLite as their full text search implementation. 2. Potential incompatability with query parser tokenizer The table's text tokenizer is used to tokenize the query string as well. AFAIK, both must be identical. I can not see how this single query tokenizer can then cooperate with a potentially unlmited number of incompatible row tokenizers. Reparsing the query for each row is, it guess, out of the question for performance reasons. * Alternative suggestion Offer a per COLUMN tokenizer option instead of a per ROW one. This would get rid of problem 1 because the tokenizer can be stored with the column definition. The COLUMN tokenizer option would also help with problem 2: The engine can then parse the query according to each column's tokenizer setting. Not all queries might make sense with all columns, but at least the engine would guarantee that both are using the identical tokenizer. It would be up to the application to search certain columns for a particular language query only. I also find the per column tokenizer override easier to grasp (like for translations, for that purpose), because one can different language columns with different tokenizers: Content_EN, Content_KR, and so on. This of course assumes that the number of supported languages is limited. New languages can be added with ALTER TABLE, but an application with support for an infinite number of langages would probably opt for the one table per language option. Ralf ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------