Re: [Spambayes] (no subject)

Amedee Van Gasse Thu, 25 May 2006 07:28:14 -0700

On Wed, May 24, 2006 23:44, Tony Meyer said:
> (As an aside: SpamBayes was created, for the most part, by English
> speakers.  The process should still work in other white-space
> delimited languages, but there may be a few issues.  For example,
> SpamBayes ignores any tokens that are fewer than 3 characters long -
> which includes 'worthless' English words like "a", "be", "to", "my",
> and so on.  However, many of these words are longer in German, so
> perhaps performance would be better with a lower limit of 4 (or maybe
> too much useful information would be lost then).  It would need
> experimentation to know for sure).


This sounds interesting.
The word lengths in Dutch are somewhere between those of English and German.
Is this a "configurable"?

-- 
Amedee

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Re: [Spambayes] (no subject)

Reply via email to