SequenceMatcher is for comparing one file to another file... and would be god awefull if you were to try and index an entire database of content. I just glanced at SequenceMatcher however, and it does not seem to suit my needs.
The biggest limitation I would say would be finding the right "min-sim". Depending on length of content, and searched characters, the similarity gets thrown off when you try to search for a single word. If you are indexing paragraphs (or say blog posts) the search performs much better when more keywords are typed in. With the demo, try changing the min-sim around, with the same query, and see how drastically the result set can change. Where I originally implemented this in was a mp3 playlist creator, I stored artist names, albums, etc as their own separate ngram lists. This is where ngram's really shine, comparing single-word to single-word. -Thadeus On Sat, Dec 12, 2009 at 6:08 PM, Richard <[email protected]> wrote: > difflib.SequenceMatcher -- You received this message because you are subscribed to the Google Groups "web2py-users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/web2py?hl=en.

