timr wrote:
> Here is a third proposal--What about breaking the text into chunks--
> one-word fragments, two-word fragments, three-word fragments and then
> doing a subtraction of one array of fragments from the other
> (fragments generated from 2nd sentence)? A perfect match would leave
> an empty array, while less perfect matches would leave more fragments.

If I may put in at least a reference to an idea...

Just counting words and small word groups seems error prone.  Entries 
using jargon will have repeated words with profoundly different meaning. 
Also, quotations will skew results.

I am thinking that you actually want to do something like what file 
comparison engines do.  I use Araxis for industrial strength 
comparisons, but you might want to check http://winmerge.org/ as that 
seems to have a good engine and it is open source.

I hope that helps.
-- 
Posted via http://www.ruby-forum.com/.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to