Hi, I guess the speed could significantly be improved, if you leave out _car and _ar. The inverted index which is basically (term, blob_containing_all_document_ids_of_this_term), cannot skip any of the alphabetically ordered terms if the first character is variable. At least that's my understanding.
Thank you for your idea, because I am also thinking of putting some fuzzy search on top of FTS. Best Martin ________________________________ Von: Alberto Simões <hashas...@gmail.com> An: General Discussion of SQLite Database <sqlite-users@sqlite.org> Gesendet: Freitag, den 26. Juni 2009, 13:25:57 Uhr Betreff: [sqlite] Near misses Hello. I am trying to find words in a dictionary stored in sqlite, and trying a near miss approach. For that I tried an algorithm to create patterns corresponding to Levenshtein distance of 1 (edit distance of 1). That means, one adition, one remotion or one substitution. For that, my script receives a word (say, 'car') and generated all possible additions and remotions, and substitutions: Additions: _car c_ar ca_r car_ Substitutions: _ar c_r ca_ remotions: ar cr ca Then, the script constructs an SQL query: SELECT DISTINCT(word) FROM dict WHERE word = "ar" OR word = "ca" OR word LIKE "_car" OR word LIKE "c_r" OR word = "cr" OR word LIKE "_ar" OR word LIKE "ca_r" OR word LIKE "c_ar" OR word LIKE "ca_" OR word LIKE "car_"; And this SQL quer works... but not as quickly as I need (specially because the speed is proportional to the word size). Any hint on how to speed up this thing? THank you Alberto -- Alberto Simões _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users