Hi, 

I guess the speed could significantly be improved,
if you leave out _car and _ar.
The inverted index which is basically (term, 
blob_containing_all_document_ids_of_this_term),
cannot skip any of the alphabetically ordered terms if the first character is 
variable.
At least that's my understanding.

Thank you for your idea, because I am also thinking of putting some fuzzy 
search on top of FTS.

Best Martin



________________________________
Von: Alberto Simões <hashas...@gmail.com>
An: General Discussion of SQLite Database <sqlite-users@sqlite.org>
Gesendet: Freitag, den 26. Juni 2009, 13:25:57 Uhr
Betreff: [sqlite] Near misses

Hello.

I am trying to find words in a dictionary stored in sqlite, and trying
a near miss approach.
For that I tried an algorithm to create patterns corresponding to
Levenshtein distance of 1 (edit distance of 1).
That means, one adition, one remotion or one substitution.

For that, my script receives a word (say, 'car') and generated all
possible additions and remotions, and substitutions:

Additions: _car c_ar ca_r car_
Substitutions: _ar c_r ca_
remotions: ar cr ca

Then, the script constructs an SQL query:

SELECT DISTINCT(word) FROM dict WHERE word = "ar" OR word = "ca" OR
word LIKE "_car" OR word LIKE "c_r" OR word = "cr" OR word LIKE "_ar"
OR word LIKE "ca_r" OR word LIKE "c_ar" OR word LIKE "ca_" OR word
LIKE "car_";

And this SQL quer works... but not as quickly as I need (specially
because the speed is proportional to the word size).

Any hint on how to speed up this thing?

THank you
Alberto

--
Alberto Simões
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



      
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to