Re: [sqlite] Fuzzy Matching

Stephen Woodbridge Sat, 05 Jul 2008 19:24:43 -0700

Stephen Woodbridge wrote:
> I would be interested in having something like this also.
> 
> What I don't understand in your approach is how you compute the 
> (Levenstein) distance during a search. It seems like you have a fixed 
> set of tokens from your document text and these are indexed. Then you 
> have a query token the you want to compare to the index based on some 
> fuzzy distance. Since every query can be different I think you have to 
> compute the distance for every key in the index? that would require 
> doing a full index scan.
> 
> If there ware a function that you could run a token through that would 
> given you that tokens "location" in some space then you could generate a 
> similar "location" for the query token and then use the rtree and 
> distance. I'm not aware of any such functions, but my expertise is more 
> in GIS the search searching.


Hmmm, that was supposed to say text searching.

> Thoughts?
> 
> Best,
>    -Steve
> 
> Martin Pfeifle wrote:
>> Hi, I think there is nothing available except FTS. Doing a full table
>> scan and computing for each string the (Levenstein) distance to the
>> query object is too time consuming. So what I would like to see is
>> the implementation of a generic metric index which needs as one
>> parameter a metric distance function. Based on such a distance
>> function you could then do similarity search on any objects , e.g.
>> images, strings, etc. One possible index would be the M-tree (which
>> you can also organize relational as it was done with the R*-tree).
>> The idea is that you have a hierarchical index and each node is
>> represented by a database  object o and a covering radius r
>> reflecting the maximal distance of all objects in that subtree to the
>> object o. If you do a range query now, you compute the distance of
>> your query object to the object o. If this distance minus the
>> coverage radius r is bigger than your query range you can prune that
>> subtree. You can either implement such a similarity module as an own
>> extension similar toFTS or the Spatial module, or integrate it into
>> FTS and use it only for strings. Personally, I need the second
>> solution because I'd like to do full and fuzzy text search. Are there
>> any plans to implement something like this, if yes, I would like to
>> take part in such a development. . Best Martin
>>
>>
>>
>>
>> ----- Ursprüngliche Mail ---- Von: Alberto Simões
>> <[EMAIL PROTECTED]> An: General Discussion of SQLite Database
>> <sqlite-users@sqlite.org> Gesendet: Donnerstag, den 3. Juli 2008,
>> 21:52:05 Uhr Betreff: [sqlite] Fuzzy Matching
>>
>> Hello
>>
>> Although I am quite certain that the answer is that SQLite does not 
>> provide any mechanism to help me on this, it doesn't hurt to ask. Who
>>  know if anybody have any suggestion.
>>
>> Basically, I am using SQLite for a dictionary, and I want to let the 
>> user do fuzzy searches. OK, some simple Levenshtein distance of one
>> or two would do the trick, probably.
>>
>> I imagine that SQLite (given the lite), does not provide any kind of 
>> nearmisses search. But probably, somebody here did anything similar
>> in any language?
>>
>> Cheers Alberto
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Fuzzy Matching

Reply via email to