On Wed, May 24, 2006 23:44, Tony Meyer said: > (As an aside: SpamBayes was created, for the most part, by English > speakers. The process should still work in other white-space > delimited languages, but there may be a few issues. For example, > SpamBayes ignores any tokens that are fewer than 3 characters long - > which includes 'worthless' English words like "a", "be", "to", "my", > and so on. However, many of these words are longer in German, so > perhaps performance would be better with a lower limit of 4 (or maybe > too much useful information would be lost then). It would need > experimentation to know for sure).
This sounds interesting. The word lengths in Dutch are somewhere between those of English and German. Is this a "configurable"? -- Amedee _______________________________________________ SpamBayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html