Re: [sqlite] FTS3 finds too much: Slash special meaning? Something else?
Am 03.09.2010 13:27, schrieb Dan Kennedy: > > On Sep 2, 2010, at 6:37 PM, Lukas Haase wrote: > >> Hi, >> >> I use FTS3 (SQLITE_ENABLE_FTS3) with enhanced query syntax >> (SQLITE_ENABLE_FTS3_PARENTHESIS). >> >> Now if I search for a string like '2002/91/AH' there are lots of items >> which do NOT contain this string. This is a query: >> >> SELECT rowid, content FROM fulltext WHERE content MATCH '2002/91/AH'; >> >> In my case, there are only 10 items which actually contain the string >> '2002/91/AH' but the query above gives me 162 (!) matches! >> >> I can not find any reason for this. Some of the topics contain >> "similar" >> strings like 2002/96/AH or even 94/31/EG. But in fact, these strings >> must not be matched :-( >> >> Does the slash have a special meaning in the query syntax? Does a >> query >> like 2002/91/AH have a special meaning? > > The '/' characters are serving as token separators. So > you are searching for (2002 OR 91 OR ah). If you enclose > the date in double quotes: > > ... MATCH '"2002/91/AH"' > > you will be searching for the phrase "2002 91 ah", which > is as close as you can get to what you want without writing > a custom tokenizer: > > http://www.sqlite.org/fts3.html#section_5_1 Oh great! Thank you for your hint! With double quotes it works as I would expect it. In general writing a custom tokenizer would not be a problem BUT in my case it is complicated because the database (including the fulltext table) is created on a different machine with SQLites packages from a distributor (Debian stable). The data is queried using a custom C++ application, so on client side I may alter the code. Until now I use the simple tokenizer. At first I thought I can make it work if I just remove the slash '/' from the separator list. Therefore I added the following line to simpleCreate: t->delim['/'] = 0; Now, nothing was found with content MATCH '2002/91/AH'; and first I did not understand why. After your explanation it is clear why: 2002, 91 and AH are separate tokens in my database and MATCH '"2002/91/AH'" would normally search for the tokens 2002, 91, AH next to each other. But after the patch above, 2002/91/AH is a single token which is not found. So my questions is: Is there any other way to achieve my goal? Regards, Luke ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS3 finds too much: Slash special meaning? Something else?
On Sep 2, 2010, at 6:37 PM, Lukas Haase wrote: > Hi, > > I use FTS3 (SQLITE_ENABLE_FTS3) with enhanced query syntax > (SQLITE_ENABLE_FTS3_PARENTHESIS). > > Now if I search for a string like '2002/91/AH' there are lots of items > which do NOT contain this string. This is a query: > > SELECT rowid, content FROM fulltext WHERE content MATCH '2002/91/AH'; > > In my case, there are only 10 items which actually contain the string > '2002/91/AH' but the query above gives me 162 (!) matches! > > I can not find any reason for this. Some of the topics contain > "similar" > strings like 2002/96/AH or even 94/31/EG. But in fact, these strings > must not be matched :-( > > Does the slash have a special meaning in the query syntax? Does a > query > like 2002/91/AH have a special meaning? The '/' characters are serving as token separators. So you are searching for (2002 OR 91 OR ah). If you enclose the date in double quotes: ... MATCH '"2002/91/AH"' you will be searching for the phrase "2002 91 ah", which is as close as you can get to what you want without writing a custom tokenizer: http://www.sqlite.org/fts3.html#section_5_1 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS3 finds too much: Slash special meaning? Something else?
Hi, I use FTS3 (SQLITE_ENABLE_FTS3) with enhanced query syntax (SQLITE_ENABLE_FTS3_PARENTHESIS). Now if I search for a string like '2002/91/AH' there are lots of items which do NOT contain this string. This is a query: SELECT rowid, content FROM fulltext WHERE content MATCH '2002/91/AH'; In my case, there are only 10 items which actually contain the string '2002/91/AH' but the query above gives me 162 (!) matches! I can not find any reason for this. Some of the topics contain "similar" strings like 2002/96/AH or even 94/31/EG. But in fact, these strings must not be matched :-( Does the slash have a special meaning in the query syntax? Does a query like 2002/91/AH have a special meaning? What else could be the reason and is there a way to prevent FTS to find this wrong entries? Thank you very much in advance! Regards, Luke ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users