Hello Karsten/guenther, (?)

Thanks for your reply.

Karsten Bräckelmann schrieb:
> On Tue, 2008-10-21 at 11:56 +0200, Heinrich Christian Peters wrote:
>> I am using a system-wide spamassassin setup (MailScanner). Nearly all my
>> spam-mails are detected correctly (~0,1% is not), no FP. But, especially
>> German spam-mails, are "wrongly" classified by the bayes-system. Should
> 
> According to your stats snippets:  BAYES_50 is not "wrongly" classified,
> but not-classified-at-all. The difference is the very meaning of a
> Bayesian score of 0.5 -- undecided, neither really spammy nor hammy
> tokens.

I see, but what are about the 1.6% of spam (around 57 mails) classified
by the bayes-system as ham (BAYES_00)? And, another thing, as you can
see, if the mail was classified as "BAYES_50" it is in nearly every case
spam, so I think, the mails are wrongly classified, they should be
BAYES_60 or higher...


>> I train thees mails manually as spam, if they are not autolearned? Or
>> should I train *all* my spam-mails regular?
> 
> Personally, I prefer to not learn *all* spam, but to omit the lions
> share of really high scoring stuff. The reason is an attempt to keep the
> number of tokens somewhat sane, and to not bias my Bayes. If everything
> Bayes gets to see is spam, everything will appear spammy.
> 
> If you get a certain class of sneaky spam, you definitely should feed
> that to Bayes. However, if it also loosely resembles your ham [1], you
> better make sure to train them as well.

Up till now, I train only the wrongly marked mails manually,
autolearning is working, same for ham. But, as I said before, I have no FPs.


> Since you merely mentioned "German spam", the details might make a
> difference, though. What are you talking about exactly?

German is my first language and nearly all (ham-)mails I get, are
German.  The few English (ham-)mails I get are correctly classified as
BAYES_10 or below.
The (spam-)mails I am talking about are eg.:
 - phishing-mails (today: DABbank AG)
 - casino (Fiesta Club Casino, Euro Club Casino)
 - pharmacy, mostly caught by ZMIde_Pharmacy
 - "job offers", finance-sector


> Given your timing, my guess is you're talking about the recent flood of
> German porn spam, advertising cam sites. Even though they are using
> pretty explicit phrases, these appear to be hard to catch.

These mails are not the problem, I didn't get them...


> If that's the kind of spam you're talking about, check the archives.
> This has been brought up very recently. Not much solutions though, IIRC.
> They are hard to catch, and a few people are working on rules as we
> speak.  HTH
> 
>   guenther
> 
> 
> [1] In this context this means, German spam tends to be sneaky, and
>     German is your users first language.

Thanks, Yours,
Heiner

Reply via email to