Alex Tweedly wrote:


Since you're asking for "fastest" algorithm, I assume at least one of
these numbers is large - and it's unlikely the same algorithm would
excel at both ends of the spectrum .....

I have 2 or 3 approaches in mind ... any clues you have on the
characteristics of the data would help decide which ones to pursue .....
e.g.   build an assoc array of the words which occur in myReferences,
where the content of the array element is the word number within the
references, and lookup each word in the sentence with that ... should be
faster than "itemOffset" but probably not enough to justify it if the
number of words in myReferences is very small.



The "array" approach crossed my mind, but I'm afraid it's not feasable :
the set of reference words is unpredictable. It is actually derived from a
sentence entered by the end user, and after some processing / selection
a set words is extracted from that sentence and then compared with other
sentences in a data base...

Still feasible ....( typing straight into email - typos possible )

put empty into myArray
put 0 into count
repeat for each item W in myReference
   add 1 to count
   put W && count & cr after myArray
end repeat
split myArray by CR and space

I'll try some simple experiments based on the figures you gave.... more later.


I tried :
20 words in the reference list
20 - 30 words in each sentence
1000 sentences  (actually, two sentences, repeated 500 times :-)
500 matches (each match is of 6 words from the ref list)

your original algorithm took 305 millisecs (once I remembered to set wholeMatches to true !!) using the array split method as above took 28 millisecs (including the time to set up and split the array)

Hope that's fast enough - but if not, let me know and I'll try another approach ....

--
Alex Tweedly       http://www.tweedly.net



--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.6.2 - Release Date: 04/06/2005

_______________________________________________
use-revolution mailing list
[email protected]
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to