Hi Dale,
It is a good ruby question (and rails is a ruby framework--so I think
it is fair game). I don't know that the Levenshtien suggestion will be
that helpful. (You can try it with require 'text', since it is part of
the built in text module) The algorithm it uses is based on the number
of changes that need to be made in one string to get a second
(deletions, substitions, and additions). It is nice for comparing
words for possible misspellings etc. But in your case, if you want to
compare content, you need an approach that focuses on word frequency
and context. Here Levenshtien is not the right tool (at least, doesn't
seem so to me).

Endtagger looks interesting module--could be useful.

Here is a third proposal--What about breaking the text into chunks--
one-word fragments, two-word fragments, three-word fragments and then
doing a subtraction of one array of fragments from the other
(fragments generated from 2nd sentence)? A perfect match would leave
an empty array, while less perfect matches would leave more fragments.
It might take some tinkering to get the algorithm tuned right, but how
much tinkering depends on how much information you need to get from
the poorer matches. The single word fragments compare for content, the
double and triple word fragments would compare context.
Good luck,
Tim


On Jun 19, 5:25 pm, PeteSalty <[email protected]> wrote:
> This isn't really a Rails post but this group has given such great
> responses to a range of questions over the years I though I'd ask
> anyway.
>
> I've been tasked with writting a Rails app that takes a block of text,
> anywhere from about 50 characters up to 300 characters - about a
> sentance or two, and compares it to other similar sized blocks of text
> and compares how similar they are, content wise and contextually. It
> doesn't have to be perfect but it has to be reasonably close. I was
> thinking that it would be good to be able to get a numerical score
> depending on how close they were (90 is really close, 20 is not very
> close at all) but I'm certainly open to ideas.
>
> Anyway, the problem is I have no idea how to do this or even where to
> look to get started. I really doubt that there is already a Ruby
> library to do this (although that would rock) , or a Rails plug-in
> (although that would rock really hard) so I'm more looking for ideas
> on what I should be reading to get a sense on how to start on this.
> Anything would help, theoretical ideas, technical papers, Wikipedia
> articles, anything.
>
> Anyway, any suggestions are greatly appreciated.
> Dale
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to