Re: Is Bayes Dead? Have the spammers won?
On 22-mrt-2007, at 20:02, Theo Van Dinter wrote: On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote: Where bayes used to be the centerpiece of spam filtering ... FWIW, I don't think Bayes has really ever been the centerpiece of spam filtering. Definitely not within SA anyway. It's a good tool, but it's just another tool in the belt. /me continues to wait for the spammers to tire of greylisting Yes, exactly! Greylisting is still working amazingly well here. Also, most spams that get past the greylisting border are still hitting BAYES_90 or higher, even on instances where the bayes system is only being trained by autolearning. I do feel that greylisting is slowly becoming less effective though. The amount of spams that get through may have risen by as much as 50%, although this is extremely relative, because this means that in my case six spams make it through each day, instead of four, whereas I used to get 80 spams per day without greylisting. I noticed that almost all of the spams that get through are GIF image stock spam. Apparently, I should GET IN ON THE YOUTUBE OF CHINA NOW!, because that is all I'm reading about these days ;-) Leander
Re: Greylisting
On 20-nov-2006, at 23:33, Vahric MUHTARYAN wrote: Hello Everybody, I'm using SA for a long time without any problem, nowadays spammers are using too much graphical objects and they are tring to change it day by day. I'm tring to use fuzzyocr but it's taking too much cpu. I think that try greylisting . I wonder are there anybody use greylisting ? Somebody can give me feedback ? I started using selective greylisting a while ago and the results are simply amazing. For instance, my private mailbox has gone from receiving 75-100 spams/day to 2-4 spams/day. Selective greylisting is a variant of pure greylisting where you don't greylist everything, but only suspicious smtp clients. I'm using maRBL (written by Ian Campbell) for this, which acts as a policy service for Postfix. It greylists clients based on DNSBL lookups. maRBL used to be available from http://www.orangegroove.net/code/marbl/, but the site seems to have disappeared I'm actually using a modified version of maRBL, using a patch by Mark Martinec (of amavisd fame) that integrates p0f support to selectively greylist Windows smtp clients: http://archives.neohapsis.com/archives/postfix/2006-11/0577.html, which is both brilliant and hilarious :-) I have also added (primitive) support for greylisting based on missing PTR records and SPF checks myself (it actually rejects if SPF fails hard). I have put the three versions of maRBL available for download on my server: http://leander.koornneef.net/marbl/ Perhaps it can be of use to anyone. And thanks to Ian and Mark! Leander
Re: amavisd
On 17-nov-2006, at 9:26, Maccie Roux wrote: Hi there. I'm getting the following in my maillog, can someone please help me: postfix/qmgr[25394]: warning: connect to transport smtp-amavis: Connection refused Well, that is about as clear as a warning can get. What don't you understand about it? Leander
Re: amavisd
On 17-nov-2006, at 12:59, Maccie Roux wrote: Hi all. My spam is being block with amavis but it does not send it to my junk mail box. Here is my amavisd.conf file: # $timestamp_fmt_mysql = 1; # if using MySQL *and* msgs.time_iso is TIMESTAMP; # defaults to 0, which is good for non-MySQL or if msgs.time_iso is CHAR(16) $virus_admin = [EMAIL PROTECTED]; # notifications recip. $mailfrom_notify_admin = [EMAIL PROTECTED]; # notifications sender $mailfrom_notify_recip = [EMAIL PROTECTED]; # notifications sender $mailfrom_notify_spamadmin = [EMAIL PROTECTED]; # notifications sender #$mailfrom_to_quarantine = ''; # null return path; uses original sender if undef @addr_extension_virus_maps = ('virus'); @addr_extension_banned_maps = ('banned'); @addr_extension_spam_maps = ('spam'); @addr_extension_bad_header_maps = ('badh'); # $recipient_delimiter = '+'; # undef disables address extensions altogether # when enabling addr extensions do also Postfix/main.cf: recipient_delimiter=+ $path = '/usr/local/sbin:/usr/local/bin:/usr/sbin:/sbin:/usr/bin:/ bin'; # $dspam = 'dspam'; I think you only need this part. Maccie, why are you sending amavisd questions to the Spamassassin list, while it seems to me that you would be better served when asking the amavisd list about these things? Leander
Re: Flooded by pointless spam
On 13-nov-2006, at 9:03, Ramprasad wrote: I am no getting what the spammer intends to say here http://ecm.netcore.co.in/tmp/spam1.txt There is no meaningful message , no sales pitch , no stock recommendation nothing at all Any ideas ? Hmm, preemptive Bayes and/or AWL poisoning perhaps? By the way, it scores required_score here: === Content analysis details: (6.3 points, 5.0 required) pts rule name description -- -- 2.8 RCVD_FORGED_WROTE Forged 'Received' header found ('wrote:' spam) 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100% [score: 0.] === The RCVD_FORGED_WROTE score appears to come from a rule that was received from sa-update. You should consider upgrading your SA (or run sa-update if using a recent SA already). Leander
Re: mail bounce warning for the list
On 9-nov-2006, at 16:17, Randal, Phil wrote: As someone has probably already pointed out... admins use these lists because they trust their accuracy. If they receive too many complaints (as we did with a particular DNSBL) you stop blocking on that list and move to only scoring. No, you move on to greylisting based on the less accurate DNSBLs. milter-greylist 3.0rc6 supports DNSBL-based greylisting, and it works a treat here. Because it is greylisting and not blacklisting, no legitimate mail gets blocked. If you use short greylisting periods legitimate emails should get through on the second attempt. I agree with Phil. DNSBL based blacklisting has its pitfalls. So does greylisting. Combining the two of them seems like a smart thing to do. I am absolutely loving Ian Campbell's maRBL right now: http://www.orangegroove.net/code/marbl/ This is used to implement selective greylisting in Postfix, based on DNSBL hits. If you combine this with Mark Martinec's p0f-patch (http://archives.neohapsis.com/archives/postfix/2006-11/0577.html), which extends maRBL with the ability to greylist Windows clients, you get a pretty powerful tool. It is also quite easy to set up. Also, it is great fun to see spam being blocked (greylisted), just because it is sent from a Windows box :-) Nov 10 18:00:57 leander marbl: p0f collect: max_wait=0.050, 24.206.74.214 364389373 Windows XP ... = Windows XP Pro SP1, 2000 SP3, (distance 16, link: ethernet/modem) Nov 10 18:00:57 leander marbl: Action for 24.206.74.214 ([EMAIL PROTECTED]): greylisting Leander
Re: Spamassassin Score
On 6-nov-2006, at 19:59, Claus Westerkamp wrote: Hello list, Id like to modify the Score output of spamassassin. I want 3digits display permanently (e.g. ***(Score002.3)*** or ***(Score102.3)*** ) Is this possible? I want it to be able to sort the spam-messages by Score. Of course this is possible, but you will probably have to hack some code to get the result you want. As far as I know, there is no configuration option for this. If you are using amavis for instance, you could change the part where $full_spam_status is put together from: sprintf(%3.1f,$spam_level) to something like: sprintf(%05.1f,$spam_level) In spamd this would be from: my $msg_score = sprintf( %.1f, $status-get_score ); to: my $msg_score = sprintf( %05.1f, $status-get_score ); Also beware that this will be overwritten when you update/upgrade your software... Leander
Re: Spamassassin Score
On 6-nov-2006, at 21:30, Rob Anderson wrote: Leander Koornneef [EMAIL PROTECTED] 11/06/06 02:26PM As far as I know, there is no configuration option for this. SNIP NONSENSE Try this from the docs under Template Tags: _SCORE(PAD)_ message score, if PAD is included and is either spaces or zeroes, then pad scores with that many spaces or zeroes (default, none) ie: _SCORE(0)_ makes 2.4 become 02.4, _SCORE(00)_ is 002.4. 12.3 would be 12.3 and 012.3 respectively. I stand corrected :-) Leander
Re: better solution?
On 30-okt-2006, at 10:03, Matthias Haegele wrote: [EMAIL PROTECTED] schrieb: Hi list, i'm new in spamassassin, I have all the system configured ( I think ) but I have a question, when a spam message arrive the spamassassin mark it as the **spam*, then the message going to my mailbox My question it's: I want that some of this spams, instead of going to the user's INBOX folder, go to their SPAM folder. Which the better solution to achieve this? and what's the name of the program? procmail, (alternative: maildrop (if you use courier), or sieve iirc (cyrus)) I have a debian sarge, postfix, spamassassin 3.0.3 btw: i would suggest to upgrade to a newer SA (backports or testing,requires new perl too ...). Correction: the 3.1.4 version of SA in Debian volatile (http:// www.debian.org/devel/debian-volatile/) does not require a new version of perl: = leander:~# aptitude show spamassassin Package: spamassassin State: installed Automatically installed: no Version: 3.1.4-0volatile1 Priority: optional Section: mail Maintainer: Duncan Findlay [EMAIL PROTECTED] Uncompressed Size: 3068k Depends: perl (= 5.6.0-16), libhtml-parser-perl (= 3.31), libdigest- sha1-perl, libsocket6-perl, libarchive-tar-perl, libwww-perl = So the default perl 5.8 in Sarge will do fine... Leander
Re: rules_du_jour
Those kinds op spam are hitting all kinds of rules here, including rulesets from SARE: X-Spam-Status: Yes, hits=14.1 tagged_above=-999.0 required=3.0 tests=BAYES_99, EXTRA_MPART_TYPE, HTML_10_20, HTML_MESSAGE, MY_CID_AND_ARIAL2, MY_CID_AND_CLOSING, MY_CID_AND_STYLE, MY_CID_ARIAL2_CLOSING, MY_CID_ARIAL_STYLE, SARE_GIF_ATTACH, TVD_FW_GRAPHIC_ID1 I suspect you haven't done much tweaking on your SA setup? Leander On 30-okt-2006, at 21:45, User for SpamAssassin Mail List wrote: Has anyone come up with a rule that will combat the spam that I have been seeing lately? That is a spam that rambles about much of nothing then has an image or a link at the bottom. I see more and more of these and it seems like the spammers have figured out a way to get this past SA. I include one such message at the end of this post. Thanks, Ken Example of this spam: [IMAGE] Jeg er udvalgt som blogger, dvs. There is little doubt that asynchronous solutions require us to think in new ways as we have to deal with concurrency, out-of-sequence issues, correlation and other. Ingen interesse mere. But it makes me feel better that Ted Neward seems to beat me in that category, though. In my eyes this is really the best indicator of success for a pattern language. We don't have to go further than the local coffee shop. But it makes me feel better that Ted Neward seems to beat me in that category, though. While the conference logistics can be quirky at times the content is top notch. Even if you choose the right specification, it still is likely to evolve over time. Jeg er udvalgt som blogger, dvs. However, when building distributed applications, that asymmetry really has no place. After loosely coupled, stateless must be a close runner-up as the ultimate nirvana in buzzword-compliant architectures. While Java is not necessarily the greatest language to host a DSL we can go a lot further than developers generally believe or care for. Ideally, the debate would involve alcoholic beverages and the other person would pick up the check. This time, though, Ken Arnold stole a little bit of my show by publishing an excellent article in ACM Queue magazine called Programmers are People, too. During the proverbial hallway discussions we started talking about boxes and lines, but in a profound way. Read on to learn more about the implementation and our experiences with intra-JVM EDA. Hearing this tag line for the third or fourth time got me wondering, what really is the difference between coding and configuring? For one thing, a fair number of my intellectual drinking buddies tend to congregate around the large software company in the Pacific Northwest. First, because I was going to meet the exalted one in person.
Re: Any caveats upgrading from SA 3.04 to 3.17
I suggest you start here: http://svn.apache.org/repos/asf/ spamassassin/branches/3.1/UPGRADE Anyhoo, the upgrade is nothing to be scared of; certainly not if you know what you're doing. Seeing that you're using sendmail, I assume that you've probably got some (gray) hair on your chest already ;-) Leander On 30-okt-2006, at 22:16, Patrick wrote: Any caveats upgrading from SA 3.04 to 3.17? (SA,Amavis-new,Clamav,sendmail) TIA Pat...
Re: rules_du_jour
Hi Ken, please keep the discussion on the list, instead of mailing me directly, so maybe someone else can learn something from this in the future. Anyway: The EXTRA_MPART_TYPE rule is a native SA rule (in SA 3.1 at least; don't know if this is true for pre-3.1 versions) The MY_CID_* rules are part of 70_sare_stocks.cf You should check out this recent thread from the SA list: http:// www.nabble.com/rules_du_jour-question-tf2533374.html#a7062324 I''ve posted some comments on my setup there. Here's another suggestion/tip/request: please don't start new threads on mailing lists by replying to other threads. It will b0rk email clients with thread support, as well as web-based mailing list archives, as you can see on the link above... Leander On 31-okt-2006, at 0:00, User for SpamAssassin Mail List wrote: Leander, I reconize most but I do not know what rule EXTRA_MPART_TYPE and MY_CID_... are part of. Could you please pass that along. Below are a list of rules I'm running, maybe you could pass along a little info something good I should be running Thanks, Ken 70_sare_adult.cf 70_sare_bayes_poison_nxm.cf 70_sare_evilnum0.cf 70_sare_genlsubj0.cf 70_sare_header0.cf 70_sare_html0.cf 70_sare_obfu0.cf 70_sare_oem.cf 70_sare_random.cf 70_sare_specific.cf 70_sare_spoof.cf 70_sare_stocks.cf 70_sare_unsub.cf 70_sare_uri0.cf 70_sare_whitelist.cf 72_sare_bml_post25x.cf 72_sare_redirect_post3.0.0.cf 99_sare_fraud_post25x.cf chickenpox.cf tripwire.cf On Mon, 30 Oct 2006, Leander Koornneef wrote: Those kinds op spam are hitting all kinds of rules here, including rulesets from SARE: X-Spam-Status: Yes, hits=14.1 tagged_above=-999.0 required=3.0 tests=BAYES_99, EXTRA_MPART_TYPE, HTML_10_20, HTML_MESSAGE, MY_CID_AND_ARIAL2, MY_CID_AND_CLOSING, MY_CID_AND_STYLE, MY_CID_ARIAL2_CLOSING, MY_CID_ARIAL_STYLE, SARE_GIF_ATTACH, TVD_FW_GRAPHIC_ID1 I suspect you haven't done much tweaking on your SA setup? Leander On 30-okt-2006, at 21:45, User for SpamAssassin Mail List wrote: Has anyone come up with a rule that will combat the spam that I have been seeing lately? That is a spam that rambles about much of nothing then has an image or a link at the bottom. I see more and more of these and it seems like the spammers have figured out a way to get this past SA. I include one such message at the end of this post. Thanks, Ken Example of this spam: [IMAGE] Jeg er udvalgt som blogger, dvs. There is little doubt that asynchronous solutions require us to think in new ways as we have to deal with concurrency, out-of-sequence issues, correlation and other. Ingen interesse mere. But it makes me feel better that Ted Neward seems to beat me in that category, though. In my eyes this is really the best indicator of success for a pattern language. We don't have to go further than the local coffee shop. But it makes me feel better that Ted Neward seems to beat me in that category, though. While the conference logistics can be quirky at times the content is top notch. Even if you choose the right specification, it still is likely to evolve over time. Jeg er udvalgt som blogger, dvs. However, when building distributed applications, that asymmetry really has no place. After loosely coupled, stateless must be a close runner-up as the ultimate nirvana in buzzword-compliant architectures. While Java is not necessarily the greatest language to host a DSL we can go a lot further than developers generally believe or care for. Ideally, the debate would involve alcoholic beverages and the other person would pick up the check. This time, though, Ken Arnold stole a little bit of my show by publishing an excellent article in ACM Queue magazine called Programmers are People, too. During the proverbial hallway discussions we started talking about boxes and lines, but in a profound way. Read on to learn more about the implementation and our experiences with intra-JVM EDA. Hearing this tag line for the third or fourth time got me wondering, what really is the difference between coding and configuring? For one thing, a fair number of my intellectual drinking buddies tend to congregate around the large software company in the Pacific Northwest. First, because I was going to meet the exalted one in person. -- Leander Koornneef ICS B.V. Stadhouderslaan 57 3583 JD Utrecht T: +31 30 63 55 730 F: +31 30 63 55 731 E: [EMAIL PROTECTED] I: http://www.ic-s.nl ICS biedt Service Support, Development en Consultancy op uiteenlopende internet-gerelateerde platformen, met een voorliefde voor Open Source. Let op: mijn emailadres is gewijzigd naar: [EMAIL PROTECTED]
Re: rules_du_jour question
On 29-okt-2006, at 7:38, Shaun T. Erickson wrote: I've just downloaded this and set it up. I see there are MANY rulesets I can choose from, but I have no idea if they are all 'safe' (not even sure what I mean by that). Is there a subset of all these rulesets, that everybody uses, or does everyone use all of them? How do you decide which to use and which not to use? If you are using spamassassin 3.1, you can use sa-update to get the SARE rulesets from the channel provided by http://saupdates.openprotect.com/. This negates the necessity to run rulesdujour alongside sa-update. This channel consists only of safe rules. Leander
Re: rules_du_jour question
On 29-okt-2006, at 16:33, Shaun T. Erickson wrote: On 10/29/06, Leander Koornneef [EMAIL PROTECTED] wrote: If you are using spamassassin 3.1, you can use sa-update to get the SARE rulesets from the channel provided by http:// saupdates.openprotect.com/. This negates the necessity to run rulesdujour alongside sa-update. This channel consists only of safe rules. Ok. I've set that up and run it and now I have the standard set or rules and the safe sare rules under /var/lib/spamassassin/3.001007. Two questions: Do many people use the non-sare rulesets that I see are available via rules_du_jour (i.e., TRIPWIRE ANTIDRUG RANDOMVAL BOGUSVIRUS ZMI_GERMAN)? Are those something I'd still likely want to get via rules_du_jour? In my experience, using the default sa-update channel, the openprotect channel, auto-whitelisting, proper bayes training(!), pyzor, razor, dcc, SPF and DNS blacklists wil get you a spam detection rate 99%. Also, I generally use X-Spam-Level = 3 as the cutoff value in my email client to filter spam out of my Inbox. I rarely have any false positives. rules_du_jour restarts amavisd-new after it runs, but sa-update doesn't. Do most people run it out of cron and simply append an (without the quotes, of course) /etc/init.d/amavis reload to the command line? Or is there another, more preferred method? sa-update indeed does not reload amavisd, because not everyone using sa-update also runs amavis, so you should arrange this yourself. Also, if you are using amavis and spamassassin 3.1.5, you should read the last section on this page: http://wiki.apache.org/spamassassin/ RuleUpdates I use the script from that wiki page to run sa-update and reload amavisd and it works fine. Leander
Re: rules_du_jour question
On 29-okt-2006, at 17:55, Shaun T. Erickson wrote: On 10/29/06, Leander Koornneef [EMAIL PROTECTED] wrote: In my experience, using the default sa-update channel, the openprotect channel, auto-whitelisting, proper bayes training(!), pyzor, razor, dcc, SPF and DNS blacklists wil get you a spam detection rate 99%. I'm doing all that, now, I think. The auto-whitelisting seems to be happening on it's own (it does say 'auto' after all, lol), as I see the auto-whitelist file in amavis' .spamassassin directory growing. Likewise, I see the bayes_* files growing, as well. At some point, when it has seen enough stuff, it will just kick in on it's own, yes? I have a feeling that that will not be for quite some time though, as virtually all the spam never makes it onto my system, thanks to the postfix rules I have in place. Amavis/Clamav/Spamassassin have an easy job here. ;) I will have to train it on spam that it misses though. I think I saw a way to have the amavis account pull down and train on the contents of my 'missed_spam' imap folder, via fetchmail ... You should not only train SA with false positives and false negatives, but also with regular streams of ham and spam. The default autolearn threshold will for instance only train bayes with spam that scores above 12, so feeding mails to sa-learn as spam with required_score score bayes_auto_learn_threshold_spam will also increase the overall quality of your bayesian scoring (somebody please correct me if I'm wrong). For me this is easy, as my mbox files are on the same server as SA, so I can just point sa-lean to my ham and spam boxes. Otherwise, you may indeed need to use something like fetchmail to pull the mailboxes from your pop/imap server. Leander
Re: DCC worth it?
In my experience (which is not statistically comfirmed), Razor catches more spam than DCC. Usually if DCC hits, then Razor will probably also hit. This is not true the other way around: if Razor hits, DCC regularly doesn't hit. Giampaolo's comments are also valid: if they both hit, you get higher scores, which may just be enough to push a spam above your required_score. Leander On 19-okt-2006, at 10:15, Jo Rhett wrote: John Andersen wrote: Contemplating adding DCC to my SA config. I already do the SURBL tests and Razor2. Will I likely gain any thing via this? Does DCC catch what other tests miss? DCC and Razor are very similar in approach. DCC has recently lost a lot of community support due to policy decisions made by the guy who runs it, which is pretty much why Razor sprang into existence. We have them in parallel on one of our work systems, and I can't say that DCC is better than Razor. It catches some that Razor misses, but Razor seems to catch more than DCC misses. 95% of the time they are identical in result. -- Jo Rhett Network/Software Engineer Net Consonance -- Leander Koornneef ICS B.V. [EMAIL PROTECTED] http://www.ic-s.nl ICS biedt Service Support, Development en Consultancy op uiteenlopende internet-gerelateerde platformen, met een voorliefde voor Open Source. Let op: mijn emailadres is gewijzigd naar: [EMAIL PROTECTED]
Re: DCC worth it?
This seems to extreme to be true. I think you need to fix your DCC setup :-) On 19-okt-2006, at 15:19, Coffey, Neal wrote: John Andersen wrote: Contemplating adding DCC to my SA config. I already do the SURBL tests and Razor2. Will I likely gain any thing via this? Does DCC catch what other tests miss? For what it's worth, this is from seven days of logging on my company's mail server: $ zgrep RAZOR2_ spamc.log.?.gz |wc -l 49054 $ zgrep DCC_ spamc.log.?.gz |wc -l 0 And yes, I have DCC enabled. $ pwd /etc/mail/spamassassin $ grep ^loadplugin.*DCC * v310.pre:loadplugin Mail::SpamAssassin::Plugin::DCC Now, granted, there might be a problem loading or running the DCC plugin. I haven't looked to see, yet. I'm a little surprised that nothing's triggered it in the last week, but Razor2 has *always* been significantly more effective than DCC at my site, so I'm not at all worried by it. Incidentally, the breakdown looks like this: Type Total% --- All Messages 119528 100 Spam 9816882 Spam w/Razor2 4905441 Percent of Spam w/Razor250 -- Leander Koornneef ICS B.V. Stadhouderslaan 57 3583 JD Utrecht T: +31 30 63 55 730 F: +31 30 63 55 731 E: [EMAIL PROTECTED] I: http://www.ic-s.nl ICS biedt Service Support, Development en Consultancy op uiteenlopende internet-gerelateerde platformen, met een voorliefde voor Open Source. Let op: mijn emailadres is gewijzigd naar: [EMAIL PROTECTED]
Re: sa-learn killed, bayes not available
It looks like the process is getting killed from an external signal. Maybe this is the Linux OOM killer in action? What is the memory/swap status of this machine? Have you tried running sa-learn with the -D option? Leander On 29-jul-2006, at 0:31, Steven Scotten wrote: The bayesian filter seems super-delicate. If I run sa-learn on a mailbox with more than about 200 messages in it, it gets killed, I'm not sure why: $ sa-learn --spam --dir Maildir/.spam/cur/ Killed $ If sa-learn gets killed in the middle, it leaves a database that it thinks is empty. Before a killed process: debug: bayes: found bayes db version 3 debug: bayes corpus size: nspam = 592, nham = 562 After a killed process: debug: bayes: found bayes db version 3 debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB 200 rescanning doesn't do any good, because sa-learn still knows about the messages it's already looked at. I have to start training all over by deleting bayes_seen and bayes_toks. Furthermore, this kills my bayesian filter and Spamassassin lets through about 75% of my incoming spam without it. I've got thousands of spams and hams ready to feed to sa-learn, but having to feed them 100 at a time is cumbersome and starting over again a dozen times in the last few days Other than backing up my .spamassassin directory before I run sa-learn each time, are there any suggestions? I'm running 3.0.3, but it's a hosted box so upgrading isn't my call. Thanks, Steve -- Steven M. Scotten [EMAIL PROTECTED] The future will blow your mind
Re: sa-learn killed, bayes not available
Or perhaps there is some other form of resource control in place. What's the output of ulimit -a? Leander On 29-jul-2006, at 14:22, Leander Koornneef wrote: It looks like the process is getting killed from an external signal. Maybe this is the Linux OOM killer in action? What is the memory/swap status of this machine? Have you tried running sa-learn with the -D option? Leander On 29-jul-2006, at 0:31, Steven Scotten wrote: The bayesian filter seems super-delicate. If I run sa-learn on a mailbox with more than about 200 messages in it, it gets killed, I'm not sure why: $ sa-learn --spam --dir Maildir/.spam/cur/ Killed $ If sa-learn gets killed in the middle, it leaves a database that it thinks is empty. Before a killed process: debug: bayes: found bayes db version 3 debug: bayes corpus size: nspam = 592, nham = 562 After a killed process: debug: bayes: found bayes db version 3 debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB 200 rescanning doesn't do any good, because sa-learn still knows about the messages it's already looked at. I have to start training all over by deleting bayes_seen and bayes_toks. Furthermore, this kills my bayesian filter and Spamassassin lets through about 75% of my incoming spam without it. I've got thousands of spams and hams ready to feed to sa-learn, but having to feed them 100 at a time is cumbersome and starting over again a dozen times in the last few days Other than backing up my .spamassassin directory before I run sa- learn each time, are there any suggestions? I'm running 3.0.3, but it's a hosted box so upgrading isn't my call. Thanks, Steve -- Steven M. Scotten [EMAIL PROTECTED] The future will blow your mind
Re: debian woody upgrade to sarge broke bayesian database
Hi, I think I also ran into this recently. The following fixed it: [EMAIL PROTECTED]:~# aptitude install db4.2-util [EMAIL PROTECTED]:~# db4.2_upgrade /path/to/bayes_db Or something along those lines You should probably make a backup of the bayes db before you blindly copy/paste these commands :-) Leander On 21-jun-2006, at 11:21, Johan Loubser wrote: The mail server with debian woody has been upgraded to sarge. Everything seemed to work as it should but after checking a bit deeper I found that the following error: Cannot open bayes databases /home/spamd/.spamassassin/bayes_* R/O: tie failed: The spamassassin version is 3.0.3-2 the previus version was 3.0.2 -- Johan Loubser (021) 8084036 Informasie Tegnologie University of Stellenbosch