Karsten Bräckelmann schrieb: > On Tue, 2008-10-21 at 14:32 +0200, Heinrich Christian Peters wrote: >> see, if the mail was classified as "BAYES_50" it is in nearly every case >> spam, so I think, the mails are wrongly classified, they should be >> BAYES_60 or higher... > > Again, BAYES_50 is neither classified as ham nor spam. According to Byes > there's just no indication to classify it. Thus, IMHO it is not wrongly > classified. Think about it that way -- the absence of a given URL in > either black and white lists does not constitute a false hit for the > list.
Mmh, OK, I think I get it... >>> Since you merely mentioned "German spam", the details might make a >>> difference, though. What are you talking about exactly? >> >> German is my first language and nearly all (ham-)mails I get, are >> German. The few English (ham-)mails I get are correctly classified as >> BAYES_10 or below. > >> The (spam-)mails I am talking about are eg.: >> - phishing-mails (today: DABbank AG) >> - casino (Fiesta Club Casino, Euro Club Casino) > > These are not exactly spam IMHO. They are phishing mail and trojan URL > carrying mail respectively. ClamAV and the SaneSecurity phish sigs weed > those out before SA even processes the mail in my setup. MailScanner starts with the spam detection and follows upt with the content analysis. I think phishing and trojan URL carrying mails are spam, too, but maybe a special type of spam. > With a notable exception of the very recent DAB Bank phishes, which > started today. Massively. Apparently there's no AV sig yet for those. > However, even though Bayes didn't catch them for me either, they > typically score around *20* here, with hits in XBL, PBL and URIBL_BLACK. > If you really have a problem with these, I guess Bayes isn't your main > issue. ;) They score here very similar, 20 +-5. >> - pharmacy, mostly caught by ZMIde_Pharmacy > > German pharmacy spam. Similar to the above for me. Hits blacklists > galore, Bayes of 80 or higher. The bulk of these I get features rather > static text anyway -- do you really have a problem training them in > Bayes? > > Since you are using site-wide Bayes, are you sure that your manual > training uses the *same* Bayes DB? A common oops, and you'd effectively > end up with auto-learning only, no manual training on low scorers. Since I am useing "70_zmi_german.cf.zmi.sa-update.dostech.net" this mails score very high (70+). My MailScanner (with exim4) is running under debian as user Debian-exim. SpamAssassin is called as this user, too. And I train bayes as Debian-exim only. >> - "job offers", finance-sector > > Not as easy to catch indeed. Now my setup catch it, but "BAYES_20"....: > X-heinrich-peters.zz-MailScanner-SpamCheck: spam, > SpamAssassin (nicht zwischen gespeichert, Wertung=16.329, > benoetigt 5, autolearn=spam, BAYES_20 -0.74, CTYME_IXHASH 2.50, > DATE_IN_FUTURE_96_XX 1.44, DCC_CHECK 2.17, DIGEST_MULTIPLE 0.00, > GENERIC_IXHASH 4.50, NIXSPAM_IXHASH 2.50, RAZOR2_CHECK 0.50, > RCVD_IN_BL_SPAMCOP_NET 1.96, RCVD_IN_BRBL 1.50, SPF_HELO_PASS -0.00) I have no problem catching spam. But I am not lucky with a BAYES below 50 in spam-mails. But indeed, this is a /cosmetic/ problem.... Heiner