Hello Karsten/guenther, (?) Thanks for your reply.
Karsten Bräckelmann schrieb: > On Tue, 2008-10-21 at 11:56 +0200, Heinrich Christian Peters wrote: >> I am using a system-wide spamassassin setup (MailScanner). Nearly all my >> spam-mails are detected correctly (~0,1% is not), no FP. But, especially >> German spam-mails, are "wrongly" classified by the bayes-system. Should > > According to your stats snippets: BAYES_50 is not "wrongly" classified, > but not-classified-at-all. The difference is the very meaning of a > Bayesian score of 0.5 -- undecided, neither really spammy nor hammy > tokens. I see, but what are about the 1.6% of spam (around 57 mails) classified by the bayes-system as ham (BAYES_00)? And, another thing, as you can see, if the mail was classified as "BAYES_50" it is in nearly every case spam, so I think, the mails are wrongly classified, they should be BAYES_60 or higher... >> I train thees mails manually as spam, if they are not autolearned? Or >> should I train *all* my spam-mails regular? > > Personally, I prefer to not learn *all* spam, but to omit the lions > share of really high scoring stuff. The reason is an attempt to keep the > number of tokens somewhat sane, and to not bias my Bayes. If everything > Bayes gets to see is spam, everything will appear spammy. > > If you get a certain class of sneaky spam, you definitely should feed > that to Bayes. However, if it also loosely resembles your ham [1], you > better make sure to train them as well. Up till now, I train only the wrongly marked mails manually, autolearning is working, same for ham. But, as I said before, I have no FPs. > Since you merely mentioned "German spam", the details might make a > difference, though. What are you talking about exactly? German is my first language and nearly all (ham-)mails I get, are German. The few English (ham-)mails I get are correctly classified as BAYES_10 or below. The (spam-)mails I am talking about are eg.: - phishing-mails (today: DABbank AG) - casino (Fiesta Club Casino, Euro Club Casino) - pharmacy, mostly caught by ZMIde_Pharmacy - "job offers", finance-sector > Given your timing, my guess is you're talking about the recent flood of > German porn spam, advertising cam sites. Even though they are using > pretty explicit phrases, these appear to be hard to catch. These mails are not the problem, I didn't get them... > If that's the kind of spam you're talking about, check the archives. > This has been brought up very recently. Not much solutions though, IIRC. > They are hard to catch, and a few people are working on rules as we > speak. HTH > > guenther > > > [1] In this context this means, German spam tends to be sneaky, and > German is your users first language. Thanks, Yours, Heiner