Re: [sqlite] Fuzzy Matching
I cant go into too much detail because of my current job, but for fuzzy matching levenstien isnt very good, you need to try looking into ngram matching techniques, it is absolutely awesome in reducing over/under matches. Woody --- On Sat, 7/5/08, Stephen Woodbridge <[EMAIL PROTECTED]> wrote: From: Stephen Woodbridge <[EMAIL PROTECTED]> Subject: Re: [sqlite] Fuzzy Matching To: "General Discussion of SQLite Database"Date: Saturday, July 5, 2008, 11:24 PM Stephen Woodbridge wrote: > I would be interested in having something like this also. > > What I don't understand in your approach is how you compute the > (Levenstein) distance during a search. It seems like you have a fixed > set of tokens from your document text and these are indexed. Then you > have a query token the you want to compare to the index based on some > fuzzy distance. Since every query can be different I think you have to > compute the distance for every key in the index? that would require > doing a full index scan. > > If there ware a function that you could run a token through that would > given you that tokens "location" in some space then you could generate a > similar "location" for the query token and then use the rtree and > distance. I'm not aware of any such functions, but my expertise is more > in GIS the search searching. Hmmm, that was supposed to say text searching. > Thoughts? > > Best, >-Steve > > Martin Pfeifle wrote: >> Hi, I think there is nothing available except FTS. Doing a full table >> scan and computing for each string the (Levenstein) distance to the >> query object is too time consuming. So what I would like to see is >> the implementation of a generic metric index which needs as one >> parameter a metric distance function. Based on such a distance >> function you could then do similarity search on any objects , e.g. >> images, strings, etc. One possible index would be the M-tree (which >> you can also organize relational as it was done with the R*-tree). >> The idea is that you have a hierarchical index and each node is >> represented by a database object o and a covering radius r >> reflecting the maximal distance of all objects in that subtree to the >> object o. If you do a range query now, you compute the distance of >> your query object to the object o. If this distance minus the >> coverage radius r is bigger than your query range you can prune that >> subtree. You can either implement such a similarity module as an own >> extension similar toFTS or the Spatial module, or integrate it into >> FTS and use it only for strings. Personally, I need the second >> solution because I'd like to do full and fuzzy text search. Are there >> any plans to implement something like this, if yes, I would like to >> take part in such a development. . Best Martin >> >> >> >> >> - Ursprüngliche Mail Von: Alberto Simões >> <[EMAIL PROTECTED]> An: General Discussion of SQLite Database >> Gesendet: Donnerstag, den 3. Juli 2008, >> 21:52:05 Uhr Betreff: [sqlite] Fuzzy Matching >> >> Hello >> >> Although I am quite certain that the answer is that SQLite does not >> provide any mechanism to help me on this, it doesn't hurt to ask. Who >> know if anybody have any suggestion. >> >> Basically, I am using SQLite for a dictionary, and I want to let the >> user do fuzzy searches. OK, some simple Levenshtein distance of one >> or two would do the trick, probably. >> >> I imagine that SQLite (given the lite), does not provide any kind of >> nearmisses search. But probably, somebody here did anything similar >> in any language? >> >> Cheers Alberto > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] is there a way to escape a '-' in FTS match syntax?
Is there a way to escape the negatory syntax (the minus sign / dash) in FTS MATCH syntax? I found that if I enclose the search term in quotes (ie. "T-Bone"), FTS does not treat the minus sign as a exclusion from the search. I was just wondering if there is another way that does not require me to parse the search string into terms and quote the ones that have dashes in them. Jason Boehle [EMAIL PROTECTED] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fuzzy Matching
Stephen Woodbridge wrote: > I would be interested in having something like this also. > > What I don't understand in your approach is how you compute the > (Levenstein) distance during a search. It seems like you have a fixed > set of tokens from your document text and these are indexed. Then you > have a query token the you want to compare to the index based on some > fuzzy distance. Since every query can be different I think you have to > compute the distance for every key in the index? that would require > doing a full index scan. > > If there ware a function that you could run a token through that would > given you that tokens "location" in some space then you could generate a > similar "location" for the query token and then use the rtree and > distance. I'm not aware of any such functions, but my expertise is more > in GIS the search searching. Hmmm, that was supposed to say text searching. > Thoughts? > > Best, >-Steve > > Martin Pfeifle wrote: >> Hi, I think there is nothing available except FTS. Doing a full table >> scan and computing for each string the (Levenstein) distance to the >> query object is too time consuming. So what I would like to see is >> the implementation of a generic metric index which needs as one >> parameter a metric distance function. Based on such a distance >> function you could then do similarity search on any objects , e.g. >> images, strings, etc. One possible index would be the M-tree (which >> you can also organize relational as it was done with the R*-tree). >> The idea is that you have a hierarchical index and each node is >> represented by a database object o and a covering radius r >> reflecting the maximal distance of all objects in that subtree to the >> object o. If you do a range query now, you compute the distance of >> your query object to the object o. If this distance minus the >> coverage radius r is bigger than your query range you can prune that >> subtree. You can either implement such a similarity module as an own >> extension similar toFTS or the Spatial module, or integrate it into >> FTS and use it only for strings. Personally, I need the second >> solution because I'd like to do full and fuzzy text search. Are there >> any plans to implement something like this, if yes, I would like to >> take part in such a development. . Best Martin >> >> >> >> >> - Ursprüngliche Mail Von: Alberto Simões >> <[EMAIL PROTECTED]> An: General Discussion of SQLite Database >>Gesendet: Donnerstag, den 3. Juli 2008, >> 21:52:05 Uhr Betreff: [sqlite] Fuzzy Matching >> >> Hello >> >> Although I am quite certain that the answer is that SQLite does not >> provide any mechanism to help me on this, it doesn't hurt to ask. Who >> know if anybody have any suggestion. >> >> Basically, I am using SQLite for a dictionary, and I want to let the >> user do fuzzy searches. OK, some simple Levenshtein distance of one >> or two would do the trick, probably. >> >> I imagine that SQLite (given the lite), does not provide any kind of >> nearmisses search. But probably, somebody here did anything similar >> in any language? >> >> Cheers Alberto > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fuzzy Matching
I would be interested in having something like this also. What I don't understand in your approach is how you compute the (Levenstein) distance during a search. It seems like you have a fixed set of tokens from your document text and these are indexed. Then you have a query token the you want to compare to the index based on some fuzzy distance. Since every query can be different I think you have to compute the distance for every key in the index? that would require doing a full index scan. If there ware a function that you could run a token through that would given you that tokens "location" in some space then you could generate a similar "location" for the query token and then use the rtree and distance. I'm not aware of any such functions, but my expertise is more in GIS the search searching. Thoughts? Best, -Steve Martin Pfeifle wrote: > Hi, I think there is nothing available except FTS. Doing a full table > scan and computing for each string the (Levenstein) distance to the > query object is too time consuming. So what I would like to see is > the implementation of a generic metric index which needs as one > parameter a metric distance function. Based on such a distance > function you could then do similarity search on any objects , e.g. > images, strings, etc. One possible index would be the M-tree (which > you can also organize relational as it was done with the R*-tree). > The idea is that you have a hierarchical index and each node is > represented by a database object o and a covering radius r > reflecting the maximal distance of all objects in that subtree to the > object o. If you do a range query now, you compute the distance of > your query object to the object o. If this distance minus the > coverage radius r is bigger than your query range you can prune that > subtree. You can either implement such a similarity module as an own > extension similar toFTS or the Spatial module, or integrate it into > FTS and use it only for strings. Personally, I need the second > solution because I'd like to do full and fuzzy text search. Are there > any plans to implement something like this, if yes, I would like to > take part in such a development. . Best Martin > > > > > - Ursprüngliche Mail Von: Alberto Simões > <[EMAIL PROTECTED]> An: General Discussion of SQLite Database >Gesendet: Donnerstag, den 3. Juli 2008, > 21:52:05 Uhr Betreff: [sqlite] Fuzzy Matching > > Hello > > Although I am quite certain that the answer is that SQLite does not > provide any mechanism to help me on this, it doesn't hurt to ask. Who > know if anybody have any suggestion. > > Basically, I am using SQLite for a dictionary, and I want to let the > user do fuzzy searches. OK, some simple Levenshtein distance of one > or two would do the trick, probably. > > I imagine that SQLite (given the lite), does not provide any kind of > nearmisses search. But probably, somebody here did anything similar > in any language? > > Cheers Alberto ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] rtree module crashes
Hi Dan, sorry, but I do not have any access to the page, I think. Hartwig Am 04.07.2008 um 17:00 schrieb Dan: > > On Jul 4, 2008, at 9:24 PM, Hartwig Wiesmann wrote: > >> Hi, >> >> I posted a while ago the mail attached below but did not receive any >> answer. If there is any better place to discuss it please let me >> know. >> >> When I compile SQLite using SQLITE_ENABLE_RTREE set to 1 SQLite will >> crash when opening a database (Mac OSX). The reason seems to be that >> in rtree.c sqlite3ext.h is included instead of sqlite3.h. This can be >> prevented by setting SQLITE_CORE to 1 but then the types i64, u8 etc. >> are undefined. >> >> So, my solution: >> >> SQLITE_ENABLE_RTREE set to 1 >> SQLITE_CORE set to 1 >> and define i64, u8 etc. in all cases. >> >> Did I do anything wrong? > > See here: > > http://sqlite.org:8080/cgi-bin/mailman/private/sqlite-users/2008- > June/004005.html > > Dan. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] validate SQL Statement
>From the manual, doesnt sqlite3_prepare do the following: "To execute an SQL query, it must first be compiled into a byte-code program using one of these routines." If you are really paranoid, what about taking the input SQL statement x and then verifying it by issuing: sqlite3_prepare("EXPLAIN x")? That way I just tried "EXPLAIN SELECT id1 FROM myTable" where table 'myTable' contains no column 'id1' and it informed me of my error. On Thu, Jul 3, 2008 at 10:23 AM, Umaa Krishnan <[EMAIL PROTECTED]> wrote: > Well, I assume SQLPrepare allocates and locks appropriate resources. I need > to only check the sanity of the statement, and then discard. > > So I was wondering if there was a way to do it, instead of prepare statement > > --- On Thu, 7/3/08, D. Richard Hipp <[EMAIL PROTECTED]> wrote: > From: D. Richard Hipp <[EMAIL PROTECTED]> > Subject: Re: [sqlite] validate SQL Statement > To: "General Discussion of SQLite Database"> Date: Thursday, July 3, 2008, 2:10 AM > > On Jul 2, 2008, at 11:03 PM, Umaa Krishnan wrote: > >> I was wondering if there a way in sqlite, wherein I could validate >> the SQL statement (for correct grammar, resource name - column name, >> table name etc), w/o having to do prepare. > > > You speak as if sqlite3_prepare() were a huge burden - something worth > avoiding. In practice it is usually very fast. What problem are you > trying to solve? > > No, there is no other way to validate an SQL statement. > > > D. Richard Hipp > [EMAIL PROTECTED] > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Trying to free not allocated memory while Insert
Hi , I am working in Sqlite 3.3.6 source files. I tried to insert one record and I found that sqlite3GenericFree (p) is called. But that address is not allocated using sqlite3GenericMalloc (int n). Is this Correct.Kindly clarify. Will all memory gets allocated and freed inside os_common.h or at some other place. But I can insert that record without any problem .Then I tried to insert some 1 records but at that time it hangs inside sqlite3Parser () --- > yy_reduce. To fix this what changes I have to do in my code. In Parse.c I found some line of codes like: #ifndef NDEBUG /* Silence complaints from purify about yygotominor being uninitialized In some cases when it is copied into the stack after the following Switch. yygotominor is uninitialized when a rule reduces that does Not set the value of its left-hand side nonterminal. Leaving the Value of the nonterminal uninitialized is utterly harmless as long As the value is never used. So really the only thing this code accomplishes is to quieten purify. */ memset (, 0, sizeof (yygotominor)); #endif But in (SQLite ticket #2172) I read that: The wireshark project (www.wireshark.org) reports that without this code, their parser segfaults. I'm not sure what there parser is doing to make this happen. This is the second bug report from wireshark this week. Clearly they are stressing Lemon in ways that it has not been previously stressed... So they removed that #ifndef NDEBUG. Please help me solve this issue. Thanks & Regards, Mahalakshmi ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users