timr wrote: > Here is a third proposal--What about breaking the text into chunks-- > one-word fragments, two-word fragments, three-word fragments and then > doing a subtraction of one array of fragments from the other > (fragments generated from 2nd sentence)? A perfect match would leave > an empty array, while less perfect matches would leave more fragments.
If I may put in at least a reference to an idea... Just counting words and small word groups seems error prone. Entries using jargon will have repeated words with profoundly different meaning. Also, quotations will skew results. I am thinking that you actually want to do something like what file comparison engines do. I use Araxis for industrial strength comparisons, but you might want to check http://winmerge.org/ as that seems to have a good engine and it is open source. I hope that helps. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---

