Re: [sqlite] Fuzzy Matching

2008-07-05 Thread Harold Wood
I cant go into too much detail because of my current job, but for fuzzy 
matching levenstien isnt very good, you need to try looking into ngram matching 
techniques, it is absolutely awesome in reducing over/under matches.
 
Woody

--- On Sat, 7/5/08, Stephen Woodbridge <[EMAIL PROTECTED]> wrote:

From: Stephen Woodbridge <[EMAIL PROTECTED]>
Subject: Re: [sqlite] Fuzzy Matching
To: "General Discussion of SQLite Database" 
Date: Saturday, July 5, 2008, 11:24 PM

Stephen Woodbridge wrote:
> I would be interested in having something like this also.
> 
> What I don't understand in your approach is how you compute the 
> (Levenstein) distance during a search. It seems like you have a fixed 
> set of tokens from your document text and these are indexed. Then you 
> have a query token the you want to compare to the index based on some 
> fuzzy distance. Since every query can be different I think you have to 
> compute the distance for every key in the index? that would require 
> doing a full index scan.
> 
> If there ware a function that you could run a token through that would 
> given you that tokens "location" in some space then you could
generate a 
> similar "location" for the query token and then use the rtree
and 
> distance. I'm not aware of any such functions, but my expertise is
more 
> in GIS the search searching.

Hmmm, that was supposed to say text searching.

> Thoughts?
> 
> Best,
>-Steve
> 
> Martin Pfeifle wrote:
>> Hi, I think there is nothing available except FTS. Doing a full table
>> scan and computing for each string the (Levenstein) distance to the
>> query object is too time consuming. So what I would like to see is
>> the implementation of a generic metric index which needs as one
>> parameter a metric distance function. Based on such a distance
>> function you could then do similarity search on any objects , e.g.
>> images, strings, etc. One possible index would be the M-tree (which
>> you can also organize relational as it was done with the R*-tree).
>> The idea is that you have a hierarchical index and each node is
>> represented by a database  object o and a covering radius r
>> reflecting the maximal distance of all objects in that subtree to the
>> object o. If you do a range query now, you compute the distance of
>> your query object to the object o. If this distance minus the
>> coverage radius r is bigger than your query range you can prune that
>> subtree. You can either implement such a similarity module as an own
>> extension similar toFTS or the Spatial module, or integrate it into
>> FTS and use it only for strings. Personally, I need the second
>> solution because I'd like to do full and fuzzy text search. Are
there
>> any plans to implement something like this, if yes, I would like to
>> take part in such a development. . Best Martin
>>
>>
>>
>>
>> - Ursprüngliche Mail  Von: Alberto Simões
>> <[EMAIL PROTECTED]> An: General Discussion of SQLite Database
>>  Gesendet: Donnerstag, den 3. Juli
2008,
>> 21:52:05 Uhr Betreff: [sqlite] Fuzzy Matching
>>
>> Hello
>>
>> Although I am quite certain that the answer is that SQLite does not 
>> provide any mechanism to help me on this, it doesn't hurt to ask.
Who
>>  know if anybody have any suggestion.
>>
>> Basically, I am using SQLite for a dictionary, and I want to let the 
>> user do fuzzy searches. OK, some simple Levenshtein distance of one
>> or two would do the trick, probably.
>>
>> I imagine that SQLite (given the lite), does not provide any kind of 
>> nearmisses search. But probably, somebody here did anything similar
>> in any language?
>>
>> Cheers Alberto
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] is there a way to escape a '-' in FTS match syntax?

2008-07-05 Thread Jason Boehle
Is there a way to escape the negatory syntax (the minus sign / dash) in FTS
MATCH syntax?  I found that if I enclose the search term in quotes (ie.
"T-Bone"), FTS does not treat the minus sign as a exclusion from the
search.  I was just wondering if there is another way that does not require
me to parse the search string into terms and quote the ones that have dashes
in them.

Jason Boehle
[EMAIL PROTECTED]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fuzzy Matching

2008-07-05 Thread Stephen Woodbridge
Stephen Woodbridge wrote:
> I would be interested in having something like this also.
> 
> What I don't understand in your approach is how you compute the 
> (Levenstein) distance during a search. It seems like you have a fixed 
> set of tokens from your document text and these are indexed. Then you 
> have a query token the you want to compare to the index based on some 
> fuzzy distance. Since every query can be different I think you have to 
> compute the distance for every key in the index? that would require 
> doing a full index scan.
> 
> If there ware a function that you could run a token through that would 
> given you that tokens "location" in some space then you could generate a 
> similar "location" for the query token and then use the rtree and 
> distance. I'm not aware of any such functions, but my expertise is more 
> in GIS the search searching.

Hmmm, that was supposed to say text searching.

> Thoughts?
> 
> Best,
>-Steve
> 
> Martin Pfeifle wrote:
>> Hi, I think there is nothing available except FTS. Doing a full table
>> scan and computing for each string the (Levenstein) distance to the
>> query object is too time consuming. So what I would like to see is
>> the implementation of a generic metric index which needs as one
>> parameter a metric distance function. Based on such a distance
>> function you could then do similarity search on any objects , e.g.
>> images, strings, etc. One possible index would be the M-tree (which
>> you can also organize relational as it was done with the R*-tree).
>> The idea is that you have a hierarchical index and each node is
>> represented by a database  object o and a covering radius r
>> reflecting the maximal distance of all objects in that subtree to the
>> object o. If you do a range query now, you compute the distance of
>> your query object to the object o. If this distance minus the
>> coverage radius r is bigger than your query range you can prune that
>> subtree. You can either implement such a similarity module as an own
>> extension similar toFTS or the Spatial module, or integrate it into
>> FTS and use it only for strings. Personally, I need the second
>> solution because I'd like to do full and fuzzy text search. Are there
>> any plans to implement something like this, if yes, I would like to
>> take part in such a development. . Best Martin
>>
>>
>>
>>
>> - Ursprüngliche Mail  Von: Alberto Simões
>> <[EMAIL PROTECTED]> An: General Discussion of SQLite Database
>>  Gesendet: Donnerstag, den 3. Juli 2008,
>> 21:52:05 Uhr Betreff: [sqlite] Fuzzy Matching
>>
>> Hello
>>
>> Although I am quite certain that the answer is that SQLite does not 
>> provide any mechanism to help me on this, it doesn't hurt to ask. Who
>>  know if anybody have any suggestion.
>>
>> Basically, I am using SQLite for a dictionary, and I want to let the 
>> user do fuzzy searches. OK, some simple Levenshtein distance of one
>> or two would do the trick, probably.
>>
>> I imagine that SQLite (given the lite), does not provide any kind of 
>> nearmisses search. But probably, somebody here did anything similar
>> in any language?
>>
>> Cheers Alberto
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fuzzy Matching

2008-07-05 Thread Stephen Woodbridge
I would be interested in having something like this also.

What I don't understand in your approach is how you compute the 
(Levenstein) distance during a search. It seems like you have a fixed 
set of tokens from your document text and these are indexed. Then you 
have a query token the you want to compare to the index based on some 
fuzzy distance. Since every query can be different I think you have to 
compute the distance for every key in the index? that would require 
doing a full index scan.

If there ware a function that you could run a token through that would 
given you that tokens "location" in some space then you could generate a 
similar "location" for the query token and then use the rtree and 
distance. I'm not aware of any such functions, but my expertise is more 
in GIS the search searching.

Thoughts?

Best,
   -Steve

Martin Pfeifle wrote:
> Hi, I think there is nothing available except FTS. Doing a full table
> scan and computing for each string the (Levenstein) distance to the
> query object is too time consuming. So what I would like to see is
> the implementation of a generic metric index which needs as one
> parameter a metric distance function. Based on such a distance
> function you could then do similarity search on any objects , e.g.
> images, strings, etc. One possible index would be the M-tree (which
> you can also organize relational as it was done with the R*-tree).
> The idea is that you have a hierarchical index and each node is
> represented by a database  object o and a covering radius r
> reflecting the maximal distance of all objects in that subtree to the
> object o. If you do a range query now, you compute the distance of
> your query object to the object o. If this distance minus the
> coverage radius r is bigger than your query range you can prune that
> subtree. You can either implement such a similarity module as an own
> extension similar toFTS or the Spatial module, or integrate it into
> FTS and use it only for strings. Personally, I need the second
> solution because I'd like to do full and fuzzy text search. Are there
> any plans to implement something like this, if yes, I would like to
> take part in such a development. . Best Martin
> 
> 
> 
> 
> - Ursprüngliche Mail  Von: Alberto Simões
> <[EMAIL PROTECTED]> An: General Discussion of SQLite Database
>  Gesendet: Donnerstag, den 3. Juli 2008,
> 21:52:05 Uhr Betreff: [sqlite] Fuzzy Matching
> 
> Hello
> 
> Although I am quite certain that the answer is that SQLite does not 
> provide any mechanism to help me on this, it doesn't hurt to ask. Who
>  know if anybody have any suggestion.
> 
> Basically, I am using SQLite for a dictionary, and I want to let the 
> user do fuzzy searches. OK, some simple Levenshtein distance of one
> or two would do the trick, probably.
> 
> I imagine that SQLite (given the lite), does not provide any kind of 
> nearmisses search. But probably, somebody here did anything similar
> in any language?
> 
> Cheers Alberto

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] rtree module crashes

2008-07-05 Thread Hartwig Wiesmann
Hi Dan,

sorry, but I do not have any access to the page, I think.

Hartwig

Am 04.07.2008 um 17:00 schrieb Dan:

>
> On Jul 4, 2008, at 9:24 PM, Hartwig Wiesmann wrote:
>
>> Hi,
>>
>> I posted a while ago the mail attached below but did not receive any
>> answer. If there is any better place to discuss it please let me  
>> know.
>>
>> When I compile SQLite using SQLITE_ENABLE_RTREE set to 1 SQLite will
>> crash when opening a database (Mac OSX). The reason seems to be that
>> in rtree.c sqlite3ext.h is included instead of sqlite3.h. This can be
>> prevented by setting SQLITE_CORE to 1 but then the types i64, u8 etc.
>> are undefined.
>>
>> So, my solution:
>>
>> SQLITE_ENABLE_RTREE set to 1
>> SQLITE_CORE set to 1
>> and define i64, u8 etc. in all cases.
>>
>> Did I do anything wrong?
>
> See here:
>
>   http://sqlite.org:8080/cgi-bin/mailman/private/sqlite-users/2008-
> June/004005.html
>
> Dan.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] validate SQL Statement

2008-07-05 Thread Rich Rattanni
>From the manual, doesnt sqlite3_prepare do the following: "To execute
an SQL query, it must first be compiled into a byte-code program using
one of these routines."  If you are really paranoid, what about taking
the input SQL statement x and then verifying it by issuing:
sqlite3_prepare("EXPLAIN x")?  That way I just tried "EXPLAIN SELECT
id1 FROM myTable" where table 'myTable' contains no column 'id1' and
it informed me of my error.

On Thu, Jul 3, 2008 at 10:23 AM, Umaa Krishnan <[EMAIL PROTECTED]> wrote:
> Well, I assume SQLPrepare allocates and locks appropriate resources. I need 
> to only check the sanity of the statement, and then discard.
>
> So I was wondering if there was a way to do it, instead of prepare statement
>
> --- On Thu, 7/3/08, D. Richard Hipp <[EMAIL PROTECTED]> wrote:
> From: D. Richard Hipp <[EMAIL PROTECTED]>
> Subject: Re: [sqlite] validate SQL Statement
> To: "General Discussion of SQLite Database" 
> Date: Thursday, July 3, 2008, 2:10 AM
>
> On Jul 2, 2008, at 11:03 PM, Umaa Krishnan wrote:
>
>> I was wondering if there a way in sqlite, wherein I could validate
>> the SQL statement (for correct grammar, resource name - column name,
>> table name etc), w/o having to do prepare.
>
>
> You speak as if sqlite3_prepare() were a huge burden - something worth
> avoiding. In practice it is usually very fast. What problem are you
> trying to solve?
>
> No, there is no other way to validate an SQL statement.
>
>
> D. Richard Hipp
> [EMAIL PROTECTED]
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Trying to free not allocated memory while Insert

2008-07-05 Thread Mahalakshmi.m

Hi ,

I am working in Sqlite 3.3.6 source files.
I tried to insert one record and I found that sqlite3GenericFree (p) is
called. But that address is not allocated using sqlite3GenericMalloc (int
n).
Is this Correct.Kindly clarify.
Will all memory gets allocated and freed inside os_common.h or at some other
place.

But I can insert that record without any problem .Then I tried to insert
some 1 records but at that time it hangs inside sqlite3Parser ()
--- > yy_reduce. To fix this what changes I have to do in my code.


In Parse.c I found some line of codes like:

#ifndef NDEBUG
  /* Silence complaints from purify about yygotominor being uninitialized
   In some cases when it is copied into the stack after the following
   Switch.  yygotominor is uninitialized when a rule reduces that does
   Not set the value of its left-hand side nonterminal.  Leaving the
   Value of the nonterminal uninitialized is utterly harmless as long
   As the value is never used.  So really the only thing this code
   accomplishes is to quieten purify.  
  */
  memset (, 0, sizeof (yygotominor));
#endif

But in (SQLite ticket #2172) I read that:
The wireshark project (www.wireshark.org) reports that
without this code, their parser segfaults.  I'm not sure what there
parser is doing to make this happen.  This is the second bug report
from wireshark this week.  Clearly they are stressing Lemon in ways
that it has not been previously stressed...  
So they removed that #ifndef NDEBUG.

Please help me solve this issue.

Thanks & Regards,
Mahalakshmi


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users