Alexey Pechnikov wrote: > Hello! > > В сообщении от Saturday 26 July 2008 21:37:19 Stephen Woodbridge написал(а): >> I have thought a lot about these issues and would appreciate any >> thoughts or ideas on how to implement any of these concepts or others >> for fuzzy searching and matching. > > I'm know that ispell, myspell, hunspell and trigrams are used in PostgreSQL > FTS. A lot of languages are supported this. And soundex function useful for > morphology search if to write word by latin alphabet (transliteration by > replace each symbol of national alphabet by one or more latin): > > sqlite> select soundex('Moskva'); > M210 > sqlite> select soundex('Moscva'); > M210 > sqlite> select soundex('Mouscva'); > M210 > sqlite> select soundex('Mouskva'); > M210 > sqlite> select soundex('moskva'); > M210 > > Note: compile SQLite with -DSQLITE_SOUNDEX=1 > > There is stemming in Apache Lucene, Sphinx (included morphology by soundex) > and Xapian too. > > Are these futures planned to be in SQLIte FTS?
Well, I will leave the question of plans to Scott Hess the FTS developer to answer. I just read a bunch of the FTS overview documents for Postgresql, which I use a lot for other projects and I like the way they have things broken down and integrated with the database. I haven't tried 8.3 yet, but it is nice to see the FTS is now part of the main distribution. http://www.sai.msu.su/~megera/postgres/fts/doc/fts-history.html http://www.sai.msu.su/~megera/postgres/fts/doc/fts-basic.html http://www.sai.msu.su/~megera/postgres/fts/doc/fts-dict.html I think you can add dictionaries as stemmers the same way you would add a stemmer to SQLlite. Look at the code in the SQLite source tree: ext/fts3/fts3_porter.c ext/fts3/fts3_tokenizer.[ch] ext/fts3/fts3_tokenizer1.c As far as other lexemes, there is nothing stopping you from creating your FTS table with additional lexeme columns that you can populate with the appropriate lexemes from the full text column. Of course, you have to generate the lexemes yourself and add them as the text for that column. For example if you wanted to have a soundex column, you could preprocess you document through the simple or porter stemmer and then take each of the tokens and generate the soundex key for them and concatenate them all with a separating space and then use that as the contents for the soundex lexeme column. Then to do a query, you would tokenize the incoming words, generate the soundex keys and to an FTS search on that column. It would obviously be nicer if this was built into the existing FTS engine, but you could do it today with some additional programming. As I said before, I will leave questions of planning for FTS up to Scott. I have read through his fts3 code, but I confess I do not understand how it all works, but a relatively small amount of code it works impressively well. All the best, -Steve _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users