Karsten Bräckelmann schrieb:
> On Tue, 2008-10-21 at 14:32 +0200, Heinrich Christian Peters wrote:
>> see, if the mail was classified as "BAYES_50" it is in nearly every case
>> spam, so I think, the mails are wrongly classified, they should be
>> BAYES_60 or higher...
> 
> Again, BAYES_50 is neither classified as ham nor spam. According to Byes
> there's just no indication to classify it. Thus, IMHO it is not wrongly
> classified. Think about it that way -- the absence of a given URL in
> either black and white lists does not constitute a false hit for the
> list.

Mmh, OK, I think I get it...


>>> Since you merely mentioned "German spam", the details might make a
>>> difference, though. What are you talking about exactly?
>>
>> German is my first language and nearly all (ham-)mails I get, are
>> German.  The few English (ham-)mails I get are correctly classified as
>> BAYES_10 or below.
> 
>> The (spam-)mails I am talking about are eg.:
>>  - phishing-mails (today: DABbank AG)
>>  - casino (Fiesta Club Casino, Euro Club Casino)
> 
> These are not exactly spam IMHO. They are phishing mail and trojan URL
> carrying mail respectively. ClamAV and the SaneSecurity phish sigs weed
> those out before SA even processes the mail in my setup.

MailScanner starts with the spam detection and follows upt with the
content analysis.
I think phishing and trojan URL carrying mails are spam, too, but maybe
a special type of spam.


> With a notable exception of the very recent DAB Bank phishes, which
> started today. Massively. Apparently there's no AV sig yet for those.
> However, even though Bayes didn't catch them for me either, they
> typically score around *20* here, with hits in XBL, PBL and URIBL_BLACK.
> If you really have a problem with these, I guess Bayes isn't your main
> issue. ;)

They score here very similar, 20 +-5.


>>  - pharmacy, mostly caught by ZMIde_Pharmacy
> 
> German pharmacy spam. Similar to the above for me. Hits blacklists
> galore, Bayes of 80 or higher. The bulk of these I get features rather
> static text anyway -- do you really have a problem training them in
> Bayes?
> 
> Since you are using site-wide Bayes, are you sure that your manual
> training uses the *same* Bayes DB? A common oops, and you'd effectively
> end up with auto-learning only, no manual training on low scorers.

Since I am useing "70_zmi_german.cf.zmi.sa-update.dostech.net" this
mails score very high (70+).
My MailScanner (with exim4) is running under debian as user Debian-exim.
SpamAssassin is called as this user, too. And I train bayes as
Debian-exim only.

>>  - "job offers", finance-sector
> 
> Not as easy to catch indeed.

Now my setup catch it, but "BAYES_20"....:

> X-heinrich-peters.zz-MailScanner-SpamCheck: spam,
>       SpamAssassin (nicht zwischen gespeichert, Wertung=16.329,
>       benoetigt 5, autolearn=spam, BAYES_20 -0.74, CTYME_IXHASH 2.50,
>       DATE_IN_FUTURE_96_XX 1.44, DCC_CHECK 2.17, DIGEST_MULTIPLE 0.00,
>       GENERIC_IXHASH 4.50, NIXSPAM_IXHASH 2.50, RAZOR2_CHECK 0.50,
>       RCVD_IN_BL_SPAMCOP_NET 1.96, RCVD_IN_BRBL 1.50, SPF_HELO_PASS -0.00)

I have no problem catching spam. But I am not lucky with a BAYES below
50 in spam-mails. But indeed, this is a /cosmetic/ problem....

Heiner

Reply via email to