SequenceMatcher is for comparing one file to another file... and would
be god awefull if you were to try and index an entire database of
content. I just glanced at SequenceMatcher however, and it does not
seem to suit my needs.

The biggest limitation I would say would be finding the right
"min-sim". Depending on length of content, and searched characters,
the similarity gets thrown off when you try to search for a single
word. If you are indexing paragraphs (or say blog posts) the search
performs much better when more keywords are typed in.

With the demo, try changing the min-sim around, with the same query,
and see how drastically the result set can change.

Where I originally implemented this in was a mp3 playlist creator, I
stored artist names, albums, etc as their own separate ngram lists.
This is where ngram's really shine, comparing single-word to
single-word.

-Thadeus





On Sat, Dec 12, 2009 at 6:08 PM, Richard <[email protected]> wrote:
> difflib.SequenceMatcher

--

You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en.


Reply via email to