> Sorry for my late reply - my evening is your morning. > There is 1000 spam a week that leaks through and perhaps another 500-600 that > get filtered by spamassassin. > If my Bayes is poorly trained what options do I have. > Here is a typical letter that gets through. > > ============================================================================ ======= > Return-Path: <[EMAIL PROTECTED]> > Received: from fw.doverie.bg (doh-gw.customer.0rbitel.net [195.24.44.114]) > by mail1.mr-bricolage.bg (8.13.3/8.13.3/Debian-6) with SMTP id > j4V11DGj014435 > for <[EMAIL PROTECTED]>; Tue, 31 May 2005 04:01:15 +0300 > Received: (qmail 13680 invoked by uid 507); 31 May 2005 00:58:54 -0000 > Delivered-To: [EMAIL PROTECTED] > Received: (qmail 13672 invoked by uid 503); 31 May 2005 00:58:48 -0000 > Received: from [EMAIL PROTECTED] by fw.doverie.bg by uid 500 with > qmail-scanner-1.15 > (f-prot: 3.12. Clear:. > Processed in 12.821956 secs); 31 May 2005 00:58:48 -0000 > Received: from cow100.orbitel.bg (HELO ns.orbitel.bg) (195.24.32.18) > by 0 with SMTP; 31 May 2005 00:58:20 -0000 > Received: (qmail 607 invoked from network); 31 May 2005 01:01:36 -0000 > Received: from unknown (HELO street67.net) (219.134.152.97) > by ns.orbitel.bg with SMTP; 31 May 2005 01:01:36 -0000 > Message-ID: <[EMAIL PROTECTED]> > Date: Mon, 30 May 2005 16:15:11 +1100 > From: "michael torrey" <[EMAIL PROTECTED]> > User-Agent: QUALCOMM Windows Eudora Version 6.0.0.22 > X-Accept-Language: en-us > MIME-Version: 1.0 > To: "Elden Irving" <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]>, > <[EMAIL PROTECTED]> > Subject: It is all about quality tableets sold at the finest prices. > Content-Type: text/plain; > charset="us-ascii" > Content-Transfer-Encoding: 7bit > X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on > mail1.mr-bricolage.bg > X-Spam-Level: > X-Spam-Status: No, score=0.1 required=2.0 tests=FORGED_RCVD_HELO > autolearn=ham version=3.0.2 > Status: R > X-Status: N > X-KMail-EncryptionState: > X-KMail-SignatureState: > X-KMail-MDN-Sent: > > At our rxdrug-site, you can choose top-selling rxmeds at a reduced prices. > Legitimate way to e-shoppe for tableets. We provide customers flexible and > reliable distribution services. > ======================================================================
It is holiday in the US, so you probably won't receive more replies for some hours. The spam you show is difficult to handle. One important thing is there is no url or other link in the message body to a drug site where people could get the spammed product. I am assuking the original spam much have had such, since a spam without a link is fairly useless. If you are getting spams without links similar to this, then other methods, such as writing some custom rules, would be required to eliminate the problem. Bayes did not trigger on this message, either for or against. I'm somewhat surprised that Bayes didn't even show a BAYES_50 score though. So bayes is neither helping nor hindering. It should be helping. But that gets us to the next point: > autolearn=ham Bayes autolearn is enabled, as it is by default. Since this got a low score, it has been learned as ham rather than spam. Sooner or later Bayes will start helping messages like this get through by giving them scores of BAYES_00. You could back this particular message out of Bayes by learning it manually as spam. However, if you are having 1000 messages a week leak through with low scores, your Bayes database probably believes that all spams are haps at this point. So there is no point in learning individual messages correctly just yet; your bayes database is probably junk. Start by setting bayes_auto_learn to 0 in local.cf to disable auto learning - it is doing mych more harm than good at this point. Later you will probably be able to turn it back on, once you have a Bayes database that knows spam from ham. But not yet. Also add a score line for BAYES_99 to fix the poor scoring in 3.0.2 for this rule: score BAYES_99 4 should do the trick. Next remove your existing bayes database and start over. You will need to manually train it on at least 200 each ham and spam. If you make a couple of mbox files, one with manually sorted spam, the other manually sorted ham, and feed these to sa-learn correctly, you should be able to get bayes working for you in no more than a day or two, probably only a few hours, depending on your mail rate. Keep training bayes manually every now and then. You should get a good base of at least a few thousand hams and spams each, representative of the sort of mail you get. If you start seeing new spams that are scoring below BAYES_70 or so, learn a few of them. Every so often learn a few new hams to keep things balanaced. You typically will only have to spend a few minutes a week dealing with this. If you get bayes trained well, you could turn on auto-learning again. But I'm personally nervous doing this, and it isn't that hard to toss a few messages to bayes every now and then. That should get bayes on your side pretty quickly. The next thing that could help you is to enable net tests, specifically the SURBL checks. These will catch a lot of your spams. You might need to be careful with any other net checks. You have a really screwy sequence of received headers, with all of those qmail headers between all the real headers. I don't know if SA will be able to deal with that and figure out where your main mail gateway is so that it can determine the trusted hosts correctly. Loren