The Trac[1] project has resurrected work on a SpamBayes plugin for
filtering Wiki and ticket edits after finding the current Akismet system
to be unreliable.  Tony Meyer added some comments[2] to the Wiki
suggesting that we write a custom tokenizer instead of using the
built-in email-centric tokenizer.

Are there examples from other people that have written custom tokenizers
that may be helpful, or do you have any hints on what to take into
account for writing an effective tokenizer for Wiki text?

-- Matt Good

[1] http://trac.edgewall.org
[2] http://trac.edgewall.org/wiki/SpamFilter#Bayes

_______________________________________________
spambayes-dev mailing list
spambayes-dev@python.org
http://mail.python.org/mailman/listinfo/spambayes-dev

Reply via email to