Hi,
responding to myself...
Ulrik Petersen wrote:
onemind wrote:
Thanks,
The thing is, i am going to need to use different letters each time to
search through over 200,000 words in a database and it needs to be fast.
What technology would be best suited for this task? I just assumed
that a
databse would be ideal, why do you say sql isn't suited for this and
what
is?
Thanks again.
[Snip]
Having said all this, the fastest way would probably be to use an
in-memory datastructure, and simply query that in-memory. One
possible -- and very simple -- solution would be to have a hash-map
for every character you wished to be able to search, then store
pointers to the strings of the words in each hash-map. That would
make your lookup-times be O(m), where m is the number of letters to
search for, rather than O(n), where n is the number of words.
What I said above is complete and utter BS. Sorry.
You might want to look into "bitsets", or "finger-prints", as
Information Retrieval specialists like to call them.
The basic idea is that you make a bitset out of each word, with one bit
for each of the "features" you want to be on or off for that word. For
your purposes, probably you want each bit in the bitset to represent the
presence or absence of one letter. If you only target the 26 letters of
the English alphabet, a 32 bit integer will suffice.
You can store such a bitset in a column in SQLite, say, "fingerprint".
Compute this as you insert the word.
Then use the "&" operator (bitwise-and) of SQLite's language to filter
out those that you don't want.
Say you are interested in those words which do contain "a" and "b", and
"c". Say that "a" is bit 1, and "b" is bit 2, "c" is bit 3. Then you
OR these together (1|2|4=7), giving the value 7 to the &-operator:
SELECT * FROM words WHERE fingerprint & 7;
HTH
Ulrik Petersen