The Trac[1] project has resurrected work on a SpamBayes plugin for filtering Wiki and ticket edits after finding the current Akismet system to be unreliable. Tony Meyer added some comments[2] to the Wiki suggesting that we write a custom tokenizer instead of using the built-in email-centric tokenizer.
Are there examples from other people that have written custom tokenizers that may be helpful, or do you have any hints on what to take into account for writing an effective tokenizer for Wiki text? -- Matt Good [1] http://trac.edgewall.org [2] http://trac.edgewall.org/wiki/SpamFilter#Bayes _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev