jbv wrote:
yes, you're right. That was a typo.
Actually my script is slightly more complex, and I had to strip
off a couple of things to make the problem easier to explain...
hence the typo...
So here's a clean version of my code :
put "" into newList
repeat for each line j in mySentences
put j into a
replace " " with itemdelimiter in a
put "" into b
repeat for each item i in a
get itemoffset(i,myReference)
if it>0 then
put it & itemdelimiter after b
end if
end repeat
delete last char in b
if b contains itemdelimiter then
put b & cr after newList
end if
end repeat
I don't see anything here which "keeps only those sentences containing
more than 1 word" ?
it's the test :
if b contains itemdelimiter then
which runs about 25% to 30% faster than
if number of items of b > 1 then
cute !! very nice - I'll remember that one ....
Since you're asking for "fastest" algorithm, I assume at least one of
these numbers is large - and it's unlikely the same algorithm would
excel at both ends of the spectrum .....
I have 2 or 3 approaches in mind ... any clues you have on the
characteristics of the data would help decide which ones to pursue .....
e.g. build an assoc array of the words which occur in myReferences,
where the content of the array element is the word number within the
references, and lookup each word in the sentence with that ... should be
faster than "itemOffset" but probably not enough to justify it if the
number of words in myReferences is very small.
The "array" approach crossed my mind, but I'm afraid it's not feasable :
the set of reference words is unpredictable. It is actually derived from a
sentence entered by the end user, and after some processing / selection
a set words is extracted from that sentence and then compared with other
sentences in a data base...
Still feasible ....( typing straight into email - typos possible )
put empty into myArray
put 0 into count
repeat for each item W in myReference
add 1 to count
put W && count & cr after myArray
end repeat
split myArray by CR and space
I'll try some simple experiments based on the figures you gave.... more
later.
--
Alex Tweedly http://www.tweedly.net
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.6.2 - Release Date: 04/06/2005
_______________________________________________
use-revolution mailing list
[email protected]
http://lists.runrev.com/mailman/listinfo/use-revolution