On Tue, 2008-10-21 at 11:56 +0200, Heinrich Christian Peters wrote:
> Hello,
> 
> I am using a system-wide spamassassin setup (MailScanner). Nearly all my
> spam-mails are detected correctly (~0,1% is not), no FP. But, especially
> German spam-mails, are "wrongly" classified by the bayes-system. Should

According to your stats snippets:  BAYES_50 is not "wrongly" classified,
but not-classified-at-all. The difference is the very meaning of a
Bayesian score of 0.5 -- undecided, neither really spammy nor hammy
tokens.

> I train thees mails manually as spam, if they are not autolearned? Or
> should I train *all* my spam-mails regular?

Personally, I prefer to not learn *all* spam, but to omit the lions
share of really high scoring stuff. The reason is an attempt to keep the
number of tokens somewhat sane, and to not bias my Bayes. If everything
Bayes gets to see is spam, everything will appear spammy.

If you get a certain class of sneaky spam, you definitely should feed
that to Bayes. However, if it also loosely resembles your ham [1], you
better make sure to train them as well.


Since you merely mentioned "German spam", the details might make a
difference, though. What are you talking about exactly?

Given your timing, my guess is you're talking about the recent flood of
German porn spam, advertising cam sites. Even though they are using
pretty explicit phrases, these appear to be hard to catch.

If that's the kind of spam you're talking about, check the archives.
This has been brought up very recently. Not much solutions though, IIRC.
They are hard to catch, and a few people are working on rules as we
speak.  HTH

  guenther


[1] In this context this means, German spam tends to be sneaky, and
    German is your users first language.

-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to