Re: [sqlite] Need sql query help

Ulrik Petersen Sun, 25 Jun 2006 09:36:26 -0700

Hi,

responding to myself...


Ulrik Petersen wrote:

onemind wrote:
Thanks,

The thing is, i am going to need to use different letters each time to
search through over 200,000 words in a database and it needs to be fast.
What technology would be best suited for this task? I just assumedthat adatabse would be ideal, why do you say sql isn't suited for this andwhat
is?

Thanks again.


[Snip]

Having said all this, the fastest way would probably be to use anin-memory datastructure, and simply query that in-memory. Onepossible -- and very simple -- solution would be to have a hash-mapfor every character you wished to be able to search, then storepointers to the strings of the words in each hash-map. That wouldmake your lookup-times be O(m), where m is the number of letters tosearch for, rather than O(n), where n is the number of words.


What I said above is complete and utter BS.  Sorry.

You might want to look into "bitsets", or "finger-prints", asInformation Retrieval specialists like to call them.

The basic idea is that you make a bitset out of each word, with one bitfor each of the "features" you want to be on or off for that word. Foryour purposes, probably you want each bit in the bitset to represent thepresence or absence of one letter. If you only target the 26 letters ofthe English alphabet, a 32 bit integer will suffice.

You can store such a bitset in a column in SQLite, say, "fingerprint".Compute this as you insert the word.

Then use the "&" operator (bitwise-and) of SQLite's language to filterout those that you don't want.

Say you are interested in those words which do contain "a" and "b", and"c". Say that "a" is bit 1, and "b" is bit 2, "c" is bit 3. Then youOR these together (1|2|4=7), giving the value 7 to the &-operator:


SELECT * FROM words WHERE fingerprint & 7;


HTH


Ulrik Petersen

Re: [sqlite] Need sql query help

Reply via email to