On Tue, 2008-10-21 at 11:56 +0200, Heinrich Christian Peters wrote: > Hello, > > I am using a system-wide spamassassin setup (MailScanner). Nearly all my > spam-mails are detected correctly (~0,1% is not), no FP. But, especially > German spam-mails, are "wrongly" classified by the bayes-system. Should
According to your stats snippets: BAYES_50 is not "wrongly" classified, but not-classified-at-all. The difference is the very meaning of a Bayesian score of 0.5 -- undecided, neither really spammy nor hammy tokens. > I train thees mails manually as spam, if they are not autolearned? Or > should I train *all* my spam-mails regular? Personally, I prefer to not learn *all* spam, but to omit the lions share of really high scoring stuff. The reason is an attempt to keep the number of tokens somewhat sane, and to not bias my Bayes. If everything Bayes gets to see is spam, everything will appear spammy. If you get a certain class of sneaky spam, you definitely should feed that to Bayes. However, if it also loosely resembles your ham [1], you better make sure to train them as well. Since you merely mentioned "German spam", the details might make a difference, though. What are you talking about exactly? Given your timing, my guess is you're talking about the recent flood of German porn spam, advertising cam sites. Even though they are using pretty explicit phrases, these appear to be hard to catch. If that's the kind of spam you're talking about, check the archives. This has been brought up very recently. Not much solutions though, IIRC. They are hard to catch, and a few people are working on rules as we speak. HTH guenther [1] In this context this means, German spam tends to be sneaky, and German is your users first language. -- char *t="[EMAIL PROTECTED]"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}