Re: [sqlite] FTS and postfix search
"Hugh Sasse"schrieb im Newsbeitrag news:alpine.lfd.2.00.0908061712390.30...@trueman.cs.cse.dmu.ac.uk... > I wonder if the idea of suffix arrays would belp. > http://www.cs.dartmouth.edu/~doug/sarray/ I was about suggesting basically the same thing... In case of Lukas' topics-table, one would have to create an additional (search)table 'topics_suffixes', where each word from within 'topics' is "expanded", resulting in multiple record-entries. In case the table 'topics' contains the word 'Motor' (with an ID of e.g. 12345) - the 'topics_suffixes'- table should get the following insertions: topic_id | word_suffixes 12345 | motor 12345 | otor 12345 | tor 12345 | or In the above sequence the listing is stopped at a "maximum- two-chars"-suffix (sparing us the storage or the last, singlechar), to safe a bit of space, since single-char Like-queries, formulated in "contains-format" (as e.g. ... word_suffixes Like '%t%' ...) are probably not that interesting regarding their usual larger recordcount-output (from the users point of view in such "live-search-scenarios"). But all these "contains-queries", searching for wordparts, larger than one single char can now be performed with larger speed against the (agreed much larger) 'topics_suffixes' table using: Select Distinct topic_id Where word_suffixes Like 'somepart%' (with a proper index on word_suffixes) - maybe combined in a Join to table topics, depending on the Apps implementation. The size of the topics_suffixes-table (and its index on word_suffixes) dependent on the average-wordlenght in topics of course - it's the usual tradeoff between "used space" and speed. Olaf ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
You can lazy-init the table of substrings and substring matches when you save words, so you only use space for substrings that appear in words that have been saved in your database. That avoids entering substrings that never occur in English, and substrings that occur in English but not in any words you are searching. - Original Message - From: "Jim Showalter" <j...@jimandlisa.com> To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org> Sent: Thursday, August 06, 2009 8:59 AM Subject: Re: [sqlite] FTS and postfix search >I hadn't thought about it until now, but there's no reason you can't >treat the table of substrings as substrings appearing anywhere in a >word, and join each substring of a word to the substrings table. > > For 26 letters, you have > 17k three-letter substrings and almost > 500k four-letter substrings, so this technique probably is limited > to three-letter substrings. But that's not bad--it means that if > character sequences are evenly distributed in words (they're not, > but assume they are for a moment), then the set of all possible > words is cut into 1/>17,000 on the first search. There are only > about 600,000 words in the English language, so with even > distribution your first search cuts the results to ~34 words. But > uneven character-sequence distributions will skew that--some > sequences will result in more words than others. Sill, even if you > get back 500 words to search for the full match, that's better than > searching all of your words. (You also have to store all one-letter > and two-letter substrings, but those don't take up much additional > room.) > > You would have to have a separate table that has the multiple joins > (one per substring) in your words, because now each word can > potentially have M substrings in it, so M foreign keys. > > For your example, Motor/motor would match mot, oto, and tor. You > would use a rule to arbitrarily match the first N letters in a the > search term, so if someone searched for "%motor%", that would be a > search in the substrings table for "mot". This would return the > primary key for "mot". Then you search the substring-matches table > for all words that have that substring, and this gives you back a > list of words containing "mot". Then you search that result set for > "%motor%". Similarly, if someone searchs for "%oto%", you get the > substring key for "oto" and find words containing that substring. > > For all I know, full-text-search engines may do something like this. > > - Original Message - > From: "Lukas Haase" <lukasha...@gmx.at> > To: <sqlite-users@sqlite.org> > Sent: Thursday, August 06, 2009 6:54 AM > Subject: Re: [sqlite] FTS and postfix search > > >> Hi Jim, >> >> and thank you for the great idea. But I thought it would be >> possible to >> search '*word*' - but this is not possible with this method either. >> >> Is there any chance for searching '*word*' quickly? >> >> Regards, >> Luke >> >> Jim Showalter schrieb: >>> You could store the words reversed (in addition to storing them in >>> forward order). Then like 'xxx%' would be fast. >>> >>> This would double your disk footprint, but could give you the >>> search >>> performance you're looking for. >>> >>> If that's too goofy, you could create a table of all one, two, and >>> three-character word endings, and join to it from all of your >>> words >>> (stored in forward order). Then search first for the primary key >>> of >>> the word ending you want to search for, then search your words for >>> that key. >>> >>> Index the join. >>> >>> - Original Message - >>> From: "Lukas Haase" <lukasha...@gmx.at> >>> To: <sqlite-users@sqlite.org> >>> Sent: Wednesday, August 05, 2009 6:16 PM >>> Subject: Re: [sqlite] FTS and postfix search >>> >>> >>>> Wes Freeman schrieb: >>>>> I clearly am not in the right mindset to be answering list >>>>> emails. >>>>> Please ignore my response (it's too late now)--back to my >>>>> stressful >>>>> deadline. >>>> :-) >>>> >>>>> Strange that it's implemented for prefix and not postfix? >>>> Well, an explanation is easy: Same as with LIKE, LIKE 'xxx' or >>>> LIKE >>>> 'xxx%' can be performed easy because only the beginning of words >>>&g
Re: [sqlite] FTS and postfix search
I hadn't thought about it until now, but there's no reason you can't treat the table of substrings as substrings appearing anywhere in a word, and join each substring of a word to the substrings table. For 26 letters, you have > 17k three-letter substrings and almost 500k four-letter substrings, so this technique probably is limited to three-letter substrings. But that's not bad--it means that if character sequences are evenly distributed in words (they're not, but assume they are for a moment), then the set of all possible words is cut into 1/>17,000 on the first search. There are only about 600,000 words in the English language, so with even distribution your first search cuts the results to ~34 words. But uneven character-sequence distributions will skew that--some sequences will result in more words than others. Sill, even if you get back 500 words to search for the full match, that's better than searching all of your words. (You also have to store all one-letter and two-letter substrings, but those don't take up much additional room.) You would have to have a separate table that has the multiple joins (one per substring) in your words, because now each word can potentially have M substrings in it, so M foreign keys. For your example, Motor/motor would match mot, oto, and tor. You would use a rule to arbitrarily match the first N letters in a the search term, so if someone searched for "%motor%", that would be a search in the substrings table for "mot". This would return the primary key for "mot". Then you search the substring-matches table for all words that have that substring, and this gives you back a list of words containing "mot". Then you search that result set for "%motor%". Similarly, if someone searchs for "%oto%", you get the substring key for "oto" and find words containing that substring. For all I know, full-text-search engines may do something like this. - Original Message - From: "Lukas Haase" <lukasha...@gmx.at> To: <sqlite-users@sqlite.org> Sent: Thursday, August 06, 2009 6:54 AM Subject: Re: [sqlite] FTS and postfix search > Hi Jim, > > and thank you for the great idea. But I thought it would be possible > to > search '*word*' - but this is not possible with this method either. > > Is there any chance for searching '*word*' quickly? > > Regards, > Luke > > Jim Showalter schrieb: >> You could store the words reversed (in addition to storing them in >> forward order). Then like 'xxx%' would be fast. >> >> This would double your disk footprint, but could give you the >> search >> performance you're looking for. >> >> If that's too goofy, you could create a table of all one, two, and >> three-character word endings, and join to it from all of your words >> (stored in forward order). Then search first for the primary key of >> the word ending you want to search for, then search your words for >> that key. >> >> Index the join. >> >> - Original Message - >> From: "Lukas Haase" <lukasha...@gmx.at> >> To: <sqlite-users@sqlite.org> >> Sent: Wednesday, August 05, 2009 6:16 PM >> Subject: Re: [sqlite] FTS and postfix search >> >> >>> Wes Freeman schrieb: >>>> I clearly am not in the right mindset to be answering list >>>> emails. >>>> Please ignore my response (it's too late now)--back to my >>>> stressful >>>> deadline. >>> :-) >>> >>>> Strange that it's implemented for prefix and not postfix? >>> Well, an explanation is easy: Same as with LIKE, LIKE 'xxx' or >>> LIKE >>> 'xxx%' can be performed easy because only the beginning of words >>> need to >>> be compared. >>> >>> However, there /is/ a way to also do postfix searches. I have the >>> *same* >>> database in *.hlp format and with WinHelp it's possible to search >>> '*otor' (and others) with almost zero CPU and time consumption. >>> I'd >>> be >>> curious how they did this. >>> >>> For a solution for SQLite I would accept a small performance >>> penalty >>> in >>> that case (but very few secs max); additionally I would also >>> accept >>> the >>> index being bigger. >>> >>> Regards, >>> Luke >>> >>>> Wes >>>> >>>> On Wed, Aug 5, 2009 at 8:58 PM, Lukas Haase<lukasha...@gmx.at> >>>> wrote: >>>>> Wes Freeman schrieb: >>>>>> Why not LIKE '%otor'? >>>>> SELECT topic_title FROM topics &g
Re: [sqlite] FTS and postfix search
Hi Jim, and thank you for the great idea. But I thought it would be possible to search '*word*' - but this is not possible with this method either. Is there any chance for searching '*word*' quickly? Regards, Luke Jim Showalter schrieb: > You could store the words reversed (in addition to storing them in > forward order). Then like 'xxx%' would be fast. > > This would double your disk footprint, but could give you the search > performance you're looking for. > > If that's too goofy, you could create a table of all one, two, and > three-character word endings, and join to it from all of your words > (stored in forward order). Then search first for the primary key of > the word ending you want to search for, then search your words for > that key. > > Index the join. > > - Original Message - > From: "Lukas Haase" <lukasha...@gmx.at> > To: <sqlite-users@sqlite.org> > Sent: Wednesday, August 05, 2009 6:16 PM > Subject: Re: [sqlite] FTS and postfix search > > >> Wes Freeman schrieb: >>> I clearly am not in the right mindset to be answering list emails. >>> Please ignore my response (it's too late now)--back to my stressful >>> deadline. >> :-) >> >>> Strange that it's implemented for prefix and not postfix? >> Well, an explanation is easy: Same as with LIKE, LIKE 'xxx' or LIKE >> 'xxx%' can be performed easy because only the beginning of words >> need to >> be compared. >> >> However, there /is/ a way to also do postfix searches. I have the >> *same* >> database in *.hlp format and with WinHelp it's possible to search >> '*otor' (and others) with almost zero CPU and time consumption. I'd >> be >> curious how they did this. >> >> For a solution for SQLite I would accept a small performance penalty >> in >> that case (but very few secs max); additionally I would also accept >> the >> index being bigger. >> >> Regards, >> Luke >> >>> Wes >>> >>> On Wed, Aug 5, 2009 at 8:58 PM, Lukas Haase<lukasha...@gmx.at> >>> wrote: >>>> Wes Freeman schrieb: >>>>> Why not LIKE '%otor'? >>>> SELECT topic_title FROM topics >>>> WHERE topic LIKE '%otor%' >>>> ORDER BY topic_title ASC; >>>> >>>> This is very, very slow, especially on my > 100 MB database. >>>> "Realtime" >>>> search in the GUI is a requirement. This is exactly the reason why >>>> I >>>> want to use FTS instead of LIKE... >>>> >>>> Regards, >>>> Luke >>>> >>>>> Wes >>>>> >>>>> On Wed, Aug 5, 2009 at 7:47 PM, Lukas Haase<lukasha...@gmx.at> >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> It's me again, sorry. The next big problem concerning FTS. I >>>>>> have the >>>>>> requirement to do postfix searches, like: >>>>>> >>>>>> SELECT topic_title FROM topics >>>>>> WHERE topic MATCH '*otor' >>>>>> ORDER BY topic_title ASC; >>>>>> >>>>>> should find Motor, motor, Monotor etc. But this does not seem to >>>>>> work. >>>>>> Is there any chance to get this working? >>>>>> >>>>>> Best regards, >>>>>> Luke >>>>>> >>>>>> ___ >>>>>> sqlite-users mailing list >>>>>> sqlite-users@sqlite.org >>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>>> >>>>> ___ >>>>> sqlite-users mailing list >>>>> sqlite-users@sqlite.org >>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>> >>>> ___ >>>> sqlite-users mailing list >>>> sqlite-users@sqlite.org >>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>> >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
Roger Binns schrieb: > [...] > > Lukas Haase wrote: >> additionally I would also accept the index being bigger. > > You could have a second FTS table where you store the keywords in reverse > order :-) Hi, DAMN!! Damn, damn. Thank you for the great idea. But unfortunately I thought I could combine postfix and prefix. But with this method this is not possible either. For example, consider the german word "Telefonanschlusskabel" (consisting of "Telefon", "Anschluss", "Kabel"). In this case '*anschluss*' should also find "Telefonanschlusskabel". Or, to be more specific, any '*partial*' query should work. Actually I need to solve this problem somehow. One possibility would be to write a complete FTS system on my own but this is really not what I want... Regards, Luke ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
On 6/08/2009 12:07 PM, Jim Showalter wrote: > Sorry--I read my emails arrival order, not reverse chronological--so I > didn't see that John had already solved it. Not me ... this is ancient lore e.g. Knuth vol 3 of TAOCP 1973 edition page 391 "If we make two copies of the file, one in which the keys are in normal alphabetic order and another in which they are ordered from right to left (as if the words were spelled backwards), a misspelled word will probably agree up to half or more of its length with an entry in one of those two files." ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
Sorry--I read my emails arrival order, not reverse chronological--so I didn't see that John had already solved it. - Original Message - From: "John Machin" <sjmac...@lexicon.net> To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org> Sent: Wednesday, August 05, 2009 6:40 PM Subject: Re: [sqlite] FTS and postfix search > On 6/08/2009 11:16 AM, Lukas Haase wrote: >> Wes Freeman schrieb: > >> >>> Strange that it's implemented for prefix and not postfix? >> >> Well, an explanation is easy: Same as with LIKE, LIKE 'xxx' or LIKE >> 'xxx%' can be performed easy because only the beginning of words >> need to >> be compared. >> >> However, there /is/ a way to also do postfix searches. I have the >> *same* >> database in *.hlp format and with WinHelp it's possible to search >> '*otor' (and others) with almost zero CPU and time consumption. I'd >> be >> curious how they did this. > > In memory: maybe a suffix tree. > > In a database: have a column with the words stored backwards. SELECT > ... > WHERE back_word LIKE "roto%" > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
Forgot to conclude by saying the search gives you a list of words that you then need to further reduce by the actual number of characters you want to search by. That's why storing them in reverse order might be preferable. Also, just thought of something--if you store them in reverse order, you don't need to also store them in forward order. Just reverse the strings before displaying them. - Original Message - From: "Jim Showalter" <j...@jimandlisa.com> To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org> Sent: Wednesday, August 05, 2009 7:04 PM Subject: Re: [sqlite] FTS and postfix search > You could store the words reversed (in addition to storing them in > forward order). Then like 'xxx%' would be fast. > > This would double your disk footprint, but could give you the search > performance you're looking for. > > If that's too goofy, you could create a table of all one, two, and > three-character word endings, and join to it from all of your words > (stored in forward order). Then search first for the primary key of > the word ending you want to search for, then search your words for > that key. > > Index the join. > > - Original Message - > From: "Lukas Haase" <lukasha...@gmx.at> > To: <sqlite-users@sqlite.org> > Sent: Wednesday, August 05, 2009 6:16 PM > Subject: Re: [sqlite] FTS and postfix search > > >> Wes Freeman schrieb: >>> I clearly am not in the right mindset to be answering list emails. >>> Please ignore my response (it's too late now)--back to my >>> stressful >>> deadline. >> >> :-) >> >>> Strange that it's implemented for prefix and not postfix? >> >> Well, an explanation is easy: Same as with LIKE, LIKE 'xxx' or LIKE >> 'xxx%' can be performed easy because only the beginning of words >> need to >> be compared. >> >> However, there /is/ a way to also do postfix searches. I have the >> *same* >> database in *.hlp format and with WinHelp it's possible to search >> '*otor' (and others) with almost zero CPU and time consumption. I'd >> be >> curious how they did this. >> >> For a solution for SQLite I would accept a small performance >> penalty in >> that case (but very few secs max); additionally I would also accept >> the >> index being bigger. >> >> Regards, >> Luke >> >>> Wes >>> >>> On Wed, Aug 5, 2009 at 8:58 PM, Lukas Haase<lukasha...@gmx.at> >>> wrote: >>>> Wes Freeman schrieb: >>>>> Why not LIKE '%otor'? >>>> SELECT topic_title FROM topics >>>> WHERE topic LIKE '%otor%' >>>> ORDER BY topic_title ASC; >>>> >>>> This is very, very slow, especially on my > 100 MB database. >>>> "Realtime" >>>> search in the GUI is a requirement. This is exactly the reason >>>> why I >>>> want to use FTS instead of LIKE... >>>> >>>> Regards, >>>> Luke >>>> >>>>> Wes >>>>> >>>>> On Wed, Aug 5, 2009 at 7:47 PM, Lukas Haase<lukasha...@gmx.at> >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> It's me again, sorry. The next big problem concerning FTS. I >>>>>> have the >>>>>> requirement to do postfix searches, like: >>>>>> >>>>>> SELECT topic_title FROM topics >>>>>> WHERE topic MATCH '*otor' >>>>>> ORDER BY topic_title ASC; >>>>>> >>>>>> should find Motor, motor, Monotor etc. But this does not seem >>>>>> to work. >>>>>> Is there any chance to get this working? >>>>>> >>>>>> Best regards, >>>>>> Luke >>>>>> >>>>>> ___ >>>>>> sqlite-users mailing list >>>>>> sqlite-users@sqlite.org >>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>>> >>>>> ___ >>>>> sqlite-users mailing list >>>>> sqlite-users@sqlite.org >>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>> >>>> ___ >>>> sqlite-users mailing list >>>> sqlite-users@sqlite.org >>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>> >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
You could store the words reversed (in addition to storing them in forward order). Then like 'xxx%' would be fast. This would double your disk footprint, but could give you the search performance you're looking for. If that's too goofy, you could create a table of all one, two, and three-character word endings, and join to it from all of your words (stored in forward order). Then search first for the primary key of the word ending you want to search for, then search your words for that key. Index the join. - Original Message - From: "Lukas Haase" <lukasha...@gmx.at> To: <sqlite-users@sqlite.org> Sent: Wednesday, August 05, 2009 6:16 PM Subject: Re: [sqlite] FTS and postfix search > Wes Freeman schrieb: >> I clearly am not in the right mindset to be answering list emails. >> Please ignore my response (it's too late now)--back to my stressful >> deadline. > > :-) > >> Strange that it's implemented for prefix and not postfix? > > Well, an explanation is easy: Same as with LIKE, LIKE 'xxx' or LIKE > 'xxx%' can be performed easy because only the beginning of words > need to > be compared. > > However, there /is/ a way to also do postfix searches. I have the > *same* > database in *.hlp format and with WinHelp it's possible to search > '*otor' (and others) with almost zero CPU and time consumption. I'd > be > curious how they did this. > > For a solution for SQLite I would accept a small performance penalty > in > that case (but very few secs max); additionally I would also accept > the > index being bigger. > > Regards, > Luke > >> Wes >> >> On Wed, Aug 5, 2009 at 8:58 PM, Lukas Haase<lukasha...@gmx.at> >> wrote: >>> Wes Freeman schrieb: >>>> Why not LIKE '%otor'? >>> SELECT topic_title FROM topics >>> WHERE topic LIKE '%otor%' >>> ORDER BY topic_title ASC; >>> >>> This is very, very slow, especially on my > 100 MB database. >>> "Realtime" >>> search in the GUI is a requirement. This is exactly the reason why >>> I >>> want to use FTS instead of LIKE... >>> >>> Regards, >>> Luke >>> >>>> Wes >>>> >>>> On Wed, Aug 5, 2009 at 7:47 PM, Lukas Haase<lukasha...@gmx.at> >>>> wrote: >>>>> Hi, >>>>> >>>>> It's me again, sorry. The next big problem concerning FTS. I >>>>> have the >>>>> requirement to do postfix searches, like: >>>>> >>>>> SELECT topic_title FROM topics >>>>> WHERE topic MATCH '*otor' >>>>> ORDER BY topic_title ASC; >>>>> >>>>> should find Motor, motor, Monotor etc. But this does not seem to >>>>> work. >>>>> Is there any chance to get this working? >>>>> >>>>> Best regards, >>>>> Luke >>>>> >>>>> ___ >>>>> sqlite-users mailing list >>>>> sqlite-users@sqlite.org >>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>> >>>> ___ >>>> sqlite-users mailing list >>>> sqlite-users@sqlite.org >>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>> >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Lukas Haase wrote: > additionally I would also accept the index being bigger. You could have a second FTS table where you store the keywords in reverse order :-) Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkp6ONUACgkQmOOfHg372QTg8wCgvUYTChJ48xnUBcNVeRUHoSQY 5X4AoMwt7vsNzc5yMHDo1x9gXImuiJKw =q1HB -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
On 6/08/2009 11:16 AM, Lukas Haase wrote: > Wes Freeman schrieb: > >> Strange that it's implemented for prefix and not postfix? > > Well, an explanation is easy: Same as with LIKE, LIKE 'xxx' or LIKE > 'xxx%' can be performed easy because only the beginning of words need to > be compared. > > However, there /is/ a way to also do postfix searches. I have the *same* > database in *.hlp format and with WinHelp it's possible to search > '*otor' (and others) with almost zero CPU and time consumption. I'd be > curious how they did this. In memory: maybe a suffix tree. In a database: have a column with the words stored backwards. SELECT ... WHERE back_word LIKE "roto%" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
Wes Freeman schrieb: > I clearly am not in the right mindset to be answering list emails. > Please ignore my response (it's too late now)--back to my stressful > deadline. :-) > Strange that it's implemented for prefix and not postfix? Well, an explanation is easy: Same as with LIKE, LIKE 'xxx' or LIKE 'xxx%' can be performed easy because only the beginning of words need to be compared. However, there /is/ a way to also do postfix searches. I have the *same* database in *.hlp format and with WinHelp it's possible to search '*otor' (and others) with almost zero CPU and time consumption. I'd be curious how they did this. For a solution for SQLite I would accept a small performance penalty in that case (but very few secs max); additionally I would also accept the index being bigger. Regards, Luke > Wes > > On Wed, Aug 5, 2009 at 8:58 PM, Lukas Haasewrote: >> Wes Freeman schrieb: >>> Why not LIKE '%otor'? >> SELECT topic_title FROM topics >> WHERE topic LIKE '%otor%' >> ORDER BY topic_title ASC; >> >> This is very, very slow, especially on my > 100 MB database. "Realtime" >> search in the GUI is a requirement. This is exactly the reason why I >> want to use FTS instead of LIKE... >> >> Regards, >> Luke >> >>> Wes >>> >>> On Wed, Aug 5, 2009 at 7:47 PM, Lukas Haase wrote: Hi, It's me again, sorry. The next big problem concerning FTS. I have the requirement to do postfix searches, like: SELECT topic_title FROM topics WHERE topic MATCH '*otor' ORDER BY topic_title ASC; should find Motor, motor, Monotor etc. But this does not seem to work. Is there any chance to get this working? Best regards, Luke ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
I clearly am not in the right mindset to be answering list emails. Please ignore my response (it's too late now)--back to my stressful deadline. Strange that it's implemented for prefix and not postfix? Wes On Wed, Aug 5, 2009 at 8:58 PM, Lukas Haasewrote: > Wes Freeman schrieb: >> Why not LIKE '%otor'? > > SELECT topic_title FROM topics > WHERE topic LIKE '%otor%' > ORDER BY topic_title ASC; > > This is very, very slow, especially on my > 100 MB database. "Realtime" > search in the GUI is a requirement. This is exactly the reason why I > want to use FTS instead of LIKE... > > Regards, > Luke > >> >> Wes >> >> On Wed, Aug 5, 2009 at 7:47 PM, Lukas Haase wrote: >>> Hi, >>> >>> It's me again, sorry. The next big problem concerning FTS. I have the >>> requirement to do postfix searches, like: >>> >>> SELECT topic_title FROM topics >>> WHERE topic MATCH '*otor' >>> ORDER BY topic_title ASC; >>> >>> should find Motor, motor, Monotor etc. But this does not seem to work. >>> Is there any chance to get this working? >>> >>> Best regards, >>> Luke >>> >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
Wes Freeman schrieb: > Why not LIKE '%otor'? SELECT topic_title FROM topics WHERE topic LIKE '%otor%' ORDER BY topic_title ASC; This is very, very slow, especially on my > 100 MB database. "Realtime" search in the GUI is a requirement. This is exactly the reason why I want to use FTS instead of LIKE... Regards, Luke > > Wes > > On Wed, Aug 5, 2009 at 7:47 PM, Lukas Haasewrote: >> Hi, >> >> It's me again, sorry. The next big problem concerning FTS. I have the >> requirement to do postfix searches, like: >> >> SELECT topic_title FROM topics >> WHERE topic MATCH '*otor' >> ORDER BY topic_title ASC; >> >> should find Motor, motor, Monotor etc. But this does not seem to work. >> Is there any chance to get this working? >> >> Best regards, >> Luke >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS and postfix search
Why not LIKE '%otor'? Wes On Wed, Aug 5, 2009 at 7:47 PM, Lukas Haasewrote: > Hi, > > It's me again, sorry. The next big problem concerning FTS. I have the > requirement to do postfix searches, like: > > SELECT topic_title FROM topics > WHERE topic MATCH '*otor' > ORDER BY topic_title ASC; > > should find Motor, motor, Monotor etc. But this does not seem to work. > Is there any chance to get this working? > > Best regards, > Luke > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS and postfix search
Hi, It's me again, sorry. The next big problem concerning FTS. I have the requirement to do postfix searches, like: SELECT topic_title FROM topics WHERE topic MATCH '*otor' ORDER BY topic_title ASC; should find Motor, motor, Monotor etc. But this does not seem to work. Is there any chance to get this working? Best regards, Luke ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users