Re: Is Bayes Dead? Have the spammers won?

2007-03-22 Thread Leander Koornneef


On 22-mrt-2007, at 20:02, Theo Van Dinter wrote:


On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote:

Where bayes used to be the centerpiece of spam filtering ...


FWIW, I don't think Bayes has really ever been the centerpiece of
spam filtering.  Definitely not within SA anyway.  It's a good tool,
but it's just another tool in the belt.

/me continues to wait for the spammers to tire of greylisting


Yes, exactly! Greylisting is still working amazingly well here.
Also, most spams that get past the greylisting border are still
hitting BAYES_90 or higher, even on instances where the
bayes system is only being trained by autolearning.

I do feel that greylisting is slowly becoming less effective though.
The amount of spams that get through may have risen by as much
as 50%, although this is extremely relative, because this means
that in my case six spams make it through each day, instead of
four, whereas I used to get 80 spams per day without greylisting.
I noticed that almost all of the spams that get through are GIF image
stock spam. Apparently, I should GET IN ON THE YOUTUBE OF
CHINA NOW!, because that is all I'm reading about these days ;-)

Leander


Re: Greylisting

2006-11-21 Thread Leander Koornneef


On 20-nov-2006, at 23:33, Vahric MUHTARYAN wrote:


Hello Everybody,

I'm using SA for a long time without any problem, nowadays  
spammers are using too much graphical objects and they are tring to  
change it day by day. I'm tring to use fuzzyocr but it's taking too  
much cpu. I think that try greylisting . I wonder are there anybody  
use greylisting ? Somebody can give me feedback ?


I started using selective greylisting a while ago and the results
are simply amazing. For instance, my private mailbox has gone
from receiving 75-100 spams/day to 2-4 spams/day. Selective
greylisting is a variant of pure greylisting where you don't greylist
everything, but only suspicious smtp clients.

I'm using maRBL (written by Ian Campbell) for this, which acts
as a policy service for Postfix. It greylists clients based on DNSBL
lookups. maRBL used to be available from
http://www.orangegroove.net/code/marbl/, but the site seems
to have disappeared

I'm actually using a modified version of maRBL, using a patch by
Mark Martinec (of amavisd fame) that integrates p0f support to
selectively greylist Windows smtp clients:
http://archives.neohapsis.com/archives/postfix/2006-11/0577.html,
which is both brilliant and hilarious :-)

I have also added (primitive) support for greylisting based on missing
PTR records and SPF checks myself (it actually rejects if SPF fails  
hard).

I have put the three versions of maRBL available for download on my
server: http://leander.koornneef.net/marbl/
Perhaps it can be of use to anyone. And thanks to Ian and Mark!

Leander




Re: amavisd

2006-11-17 Thread Leander Koornneef


On 17-nov-2006, at 9:26, Maccie Roux wrote:

Hi there.  I'm getting the following in my maillog, can someone  
please help

me:

postfix/qmgr[25394]: warning: connect to transport smtp-amavis:  
Connection refused


Well, that is about as clear as a warning can get. What don't you  
understand about it?


Leander


Re: amavisd

2006-11-17 Thread Leander Koornneef


On 17-nov-2006, at 12:59, Maccie Roux wrote:


Hi all.

My spam is being block with amavis but it does not send it to my
junk mail box.  Here is my amavisd.conf file:
# $timestamp_fmt_mysql = 1; # if using MySQL *and* msgs.time_iso is  
TIMESTAMP;
#   defaults to 0, which is good for non-MySQL or if msgs.time_iso  
is CHAR(16)


$virus_admin   = [EMAIL PROTECTED];  # notifications  
recip.


$mailfrom_notify_admin = [EMAIL PROTECTED];  # notifications  
sender
$mailfrom_notify_recip = [EMAIL PROTECTED];  # notifications  
sender

$mailfrom_notify_spamadmin = [EMAIL PROTECTED]; # notifications sender
#$mailfrom_to_quarantine = ''; # null return path; uses original  
sender if undef


@addr_extension_virus_maps  = ('virus');
@addr_extension_banned_maps = ('banned');
@addr_extension_spam_maps   = ('spam');
@addr_extension_bad_header_maps = ('badh');
# $recipient_delimiter = '+';  # undef disables address extensions  
altogether
# when enabling addr extensions do also Postfix/main.cf:  
recipient_delimiter=+


$path = '/usr/local/sbin:/usr/local/bin:/usr/sbin:/sbin:/usr/bin:/ 
bin';

# $dspam = 'dspam';

I think you only need this part.


Maccie,

why are you sending amavisd questions to the Spamassassin list, while it
seems to me that you would be better served when asking the amavisd list
about these things?

Leander


Re: Flooded by pointless spam

2006-11-13 Thread Leander Koornneef


On 13-nov-2006, at 9:03, Ramprasad wrote:


I am no getting what the spammer intends to say here
http://ecm.netcore.co.in/tmp/spam1.txt


There is no meaningful message , no sales pitch , no stock
recommendation nothing at all

Any ideas ?


Hmm, preemptive Bayes and/or AWL poisoning perhaps?

By the way, it scores  required_score here:

===
Content analysis details:   (6.3 points, 5.0 required)

pts rule name  description
 --  
--
2.8 RCVD_FORGED_WROTE  Forged 'Received' header found ('wrote:'  
spam)

3.5 BAYES_99   BODY: Bayesian spam probability is 99 to 100%
[score: 0.]
===

The RCVD_FORGED_WROTE score appears to come from
a rule that was received from sa-update. You should consider
upgrading your SA (or run sa-update if using a recent SA already).

Leander


Re: mail bounce warning for the list

2006-11-10 Thread Leander Koornneef


On 9-nov-2006, at 16:17, Randal, Phil wrote:


As someone has probably already pointed out... admins use these
lists because they trust their accuracy.  If they receive too
many complaints (as we did with a particular DNSBL) you stop
blocking on that list and move to only scoring.


No, you move on to greylisting based on the less accurate DNSBLs.
milter-greylist 3.0rc6 supports DNSBL-based greylisting, and it  
works a

treat here.  Because it is greylisting and not blacklisting, no
legitimate mail gets blocked.

If you use short greylisting periods legitimate emails should get
through on the second attempt.


I agree with Phil. DNSBL based blacklisting has its pitfalls. So does
greylisting. Combining the two of them seems like a smart thing to do.

I am absolutely loving Ian Campbell's maRBL right now:
http://www.orangegroove.net/code/marbl/
This is used to implement selective greylisting in Postfix, based on
DNSBL hits. If you combine this with Mark Martinec's p0f-patch
(http://archives.neohapsis.com/archives/postfix/2006-11/0577.html),
which extends maRBL with the ability to greylist Windows clients,
you get a pretty powerful tool. It is also quite easy to set up.

Also, it is great fun to see spam being blocked (greylisted), just
because it is sent from a Windows box :-)

Nov 10 18:00:57 leander marbl: p0f collect: max_wait=0.050,  
24.206.74.214 364389373 Windows XP ... = Windows XP Pro SP1, 2000  
SP3, (distance 16, link: ethernet/modem)
Nov 10 18:00:57 leander marbl: Action for 24.206.74.214  
([EMAIL PROTECTED]): greylisting


Leander








Re: Spamassassin Score

2006-11-06 Thread Leander Koornneef


On 6-nov-2006, at 19:59, Claus Westerkamp wrote:


Hello list,

Id like to modify the Score output of spamassassin. I want 3digits  
display permanently (e.g. ***(Score002.3)*** or ***(Score102.3)*** )


Is this possible? I want it to be able to sort the spam-messages by  
Score.


Of course this is possible, but you will probably have to
hack some code to get the result you want. As far as I
know, there is no configuration option for this.

If you are using amavis for instance, you could change
the part where $full_spam_status is put together from:

sprintf(%3.1f,$spam_level)

to something like:

sprintf(%05.1f,$spam_level)

In spamd this would be from:

my $msg_score = sprintf( %.1f, $status-get_score );

to:
my $msg_score = sprintf( %05.1f, $status-get_score );

Also beware that this will be overwritten when you update/upgrade
your software...

Leander




Re: Spamassassin Score

2006-11-06 Thread Leander Koornneef


On 6-nov-2006, at 21:30, Rob Anderson wrote:


Leander Koornneef [EMAIL PROTECTED] 11/06/06 02:26PM 



As far as I
know, there is no configuration option for this.


SNIP NONSENSE



Try this from the docs under Template Tags:

 _SCORE(PAD)_  message score, if PAD is included and is either  
spaces or
   zeroes, then pad scores with that many spaces or  
zeroes
   (default, none)  ie: _SCORE(0)_ makes 2.4 become  
02.4,

   _SCORE(00)_ is 002.4.  12.3 would be 12.3 and 012.3
   respectively.


I stand corrected :-)

Leander



Re: better solution?

2006-10-30 Thread Leander Koornneef


On 30-okt-2006, at 10:03, Matthias Haegele wrote:


[EMAIL PROTECTED] schrieb:
Hi list, i'm new in spamassassin, I have all the system configured  
( I

think )
but I have a question, when a spam message arrive the spamassassin  
mark it

as the **spam*, then the message going to my mailbox
My question it's:
I want that some of this spams, instead of going to the user's  
INBOX folder,

go to their SPAM folder.
Which the better solution to achieve this?
and what's the name of the program?


procmail, (alternative: maildrop (if you use courier), or sieve  
iirc (cyrus))



I have a debian sarge, postfix, spamassassin 3.0.3


btw: i would suggest to upgrade to a newer SA (backports or  
testing,requires new perl too ...).


Correction: the 3.1.4 version of SA in Debian volatile (http:// 
www.debian.org/devel/debian-volatile/)

does not require a new version of perl:

=
leander:~# aptitude show spamassassin
Package: spamassassin
State: installed
Automatically installed: no
Version: 3.1.4-0volatile1
Priority: optional
Section: mail
Maintainer: Duncan Findlay [EMAIL PROTECTED]
Uncompressed Size: 3068k
Depends: perl (= 5.6.0-16), libhtml-parser-perl (= 3.31), libdigest- 
sha1-perl, libsocket6-perl, libarchive-tar-perl, libwww-perl

=

So the default perl 5.8 in Sarge will do fine...

Leander


Re: rules_du_jour

2006-10-30 Thread Leander Koornneef
Those kinds op spam are hitting all kinds of rules here, including  
rulesets from SARE:


X-Spam-Status: Yes, hits=14.1 tagged_above=-999.0 required=3.0  
tests=BAYES_99, EXTRA_MPART_TYPE, HTML_10_20, HTML_MESSAGE,  
MY_CID_AND_ARIAL2, MY_CID_AND_CLOSING, MY_CID_AND_STYLE,  
MY_CID_ARIAL2_CLOSING, MY_CID_ARIAL_STYLE, SARE_GIF_ATTACH,  
TVD_FW_GRAPHIC_ID1


I suspect you haven't done much tweaking on your SA setup?

Leander

On 30-okt-2006, at 21:45, User for SpamAssassin Mail List wrote:



Has anyone come up with a rule that will combat the spam that I  
have been

seeing lately?

That is a spam that rambles about much of nothing then has an image  
or a

link at the bottom.

I see more and more of these and it seems like the spammers have  
figured

out a way to get this past SA.

I include one such message at the end of this post.

Thanks,

Ken



Example of this spam:

[IMAGE]
Jeg er udvalgt som blogger, dvs. There is little doubt that  
asynchronous

solutions require us to think in new ways as we have to deal with
concurrency, out-of-sequence issues, correlation and other. Ingen
interesse mere. But it makes me feel better that Ted Neward seems  
to beat
me in that category, though. In my eyes this is really the best  
indicator
of success for a pattern language. We don't have to go further than  
the
local coffee shop. But it makes me feel better that Ted Neward  
seems to
beat me in that category, though. While the conference logistics  
can be
quirky at times the content is top notch. Even if you choose the  
right
specification, it still is likely to evolve over time. Jeg er  
udvalgt som

blogger, dvs. However, when building distributed applications, that
asymmetry really has no place. After loosely coupled, stateless  
must

be a close runner-up as the ultimate nirvana in buzzword-compliant
architectures. While Java is not necessarily the greatest language to
host a DSL we can go a lot further than developers generally  
believe or
care for. Ideally, the debate would involve alcoholic beverages and  
the
other person would pick up the check. This time, though, Ken Arnold  
stole
a little bit of my show by publishing an excellent article in ACM  
Queue

magazine called Programmers are People, too. During the proverbial
hallway discussions we started talking about boxes and lines, but in a
profound way. Read on to learn more about the implementation and our
experiences with intra-JVM EDA. Hearing this tag line for the third or
fourth time got me wondering, what really is the difference between
coding and configuring? For one thing, a fair number of my  
intellectual
drinking buddies tend to congregate around the large software  
company in
the Pacific Northwest. First, because I was going to meet the  
exalted one

in person.








Re: Any caveats upgrading from SA 3.04 to 3.17

2006-10-30 Thread Leander Koornneef
I suggest you start here: http://svn.apache.org/repos/asf/ 
spamassassin/branches/3.1/UPGRADE


Anyhoo, the upgrade is nothing to be scared of; certainly not if you  
know what you're doing.
Seeing that you're using sendmail, I assume that you've probably got  
some (gray) hair on your chest already ;-)


Leander

On 30-okt-2006, at 22:16, Patrick wrote:


Any caveats upgrading from SA 3.04 to 3.17?

(SA,Amavis-new,Clamav,sendmail)

TIA

Pat...




Re: rules_du_jour

2006-10-30 Thread Leander Koornneef

Hi Ken,

please keep the discussion on the list, instead of mailing me  
directly, so maybe someone

else can learn something from this in the future.

Anyway:

The EXTRA_MPART_TYPE rule is a native SA rule (in SA 3.1 at least;  
don't know if this is true for pre-3.1 versions)

The MY_CID_* rules are part of 70_sare_stocks.cf

You should check out this recent thread from the SA list: http:// 
www.nabble.com/rules_du_jour-question-tf2533374.html#a7062324

I''ve posted some comments on my setup there.

Here's another suggestion/tip/request: please don't start new threads  
on mailing lists
by replying to other threads. It will b0rk email clients with thread  
support, as
well as web-based mailing list archives, as you can see on the link  
above...


Leander


On 31-okt-2006, at 0:00, User for SpamAssassin Mail List wrote:



Leander,

I reconize most but I do not know what rule EXTRA_MPART_TYPE and
MY_CID_... are part of. Could you please pass that along.

Below are a list of rules I'm running, maybe you could pass along a  
little

info something good I should be running


Thanks,

Ken

70_sare_adult.cf
70_sare_bayes_poison_nxm.cf
70_sare_evilnum0.cf
70_sare_genlsubj0.cf
70_sare_header0.cf
70_sare_html0.cf
70_sare_obfu0.cf
70_sare_oem.cf
70_sare_random.cf
70_sare_specific.cf
70_sare_spoof.cf
70_sare_stocks.cf
70_sare_unsub.cf
70_sare_uri0.cf
70_sare_whitelist.cf
72_sare_bml_post25x.cf
72_sare_redirect_post3.0.0.cf
99_sare_fraud_post25x.cf
chickenpox.cf
tripwire.cf


On Mon, 30 Oct 2006, Leander Koornneef wrote:


Those kinds op spam are hitting all kinds of rules here, including
rulesets from SARE:

X-Spam-Status: Yes, hits=14.1 tagged_above=-999.0 required=3.0
tests=BAYES_99, EXTRA_MPART_TYPE, HTML_10_20, HTML_MESSAGE,
MY_CID_AND_ARIAL2, MY_CID_AND_CLOSING, MY_CID_AND_STYLE,
MY_CID_ARIAL2_CLOSING, MY_CID_ARIAL_STYLE, SARE_GIF_ATTACH,
TVD_FW_GRAPHIC_ID1

I suspect you haven't done much tweaking on your SA setup?

Leander

On 30-okt-2006, at 21:45, User for SpamAssassin Mail List wrote:



Has anyone come up with a rule that will combat the spam that I
have been
seeing lately?

That is a spam that rambles about much of nothing then has an image
or a
link at the bottom.

I see more and more of these and it seems like the spammers have
figured
out a way to get this past SA.

I include one such message at the end of this post.

Thanks,

Ken



Example of this spam:

[IMAGE]
Jeg er udvalgt som blogger, dvs. There is little doubt that
asynchronous
solutions require us to think in new ways as we have to deal with
concurrency, out-of-sequence issues, correlation and other. Ingen
interesse mere. But it makes me feel better that Ted Neward seems
to beat
me in that category, though. In my eyes this is really the best
indicator
of success for a pattern language. We don't have to go further than
the
local coffee shop. But it makes me feel better that Ted Neward
seems to
beat me in that category, though. While the conference logistics
can be
quirky at times the content is top notch. Even if you choose the
right
specification, it still is likely to evolve over time. Jeg er
udvalgt som
blogger, dvs. However, when building distributed applications, that
asymmetry really has no place. After loosely coupled, stateless
must
be a close runner-up as the ultimate nirvana in buzzword-compliant
architectures. While Java is not necessarily the greatest  
language to

host a DSL we can go a lot further than developers generally
believe or
care for. Ideally, the debate would involve alcoholic beverages and
the
other person would pick up the check. This time, though, Ken Arnold
stole
a little bit of my show by publishing an excellent article in ACM
Queue
magazine called Programmers are People, too. During the proverbial
hallway discussions we started talking about boxes and lines, but  
in a

profound way. Read on to learn more about the implementation and our
experiences with intra-JVM EDA. Hearing this tag line for the  
third or

fourth time got me wondering, what really is the difference between
coding and configuring? For one thing, a fair number of my
intellectual
drinking buddies tend to congregate around the large software
company in
the Pacific Northwest. First, because I was going to meet the
exalted one
in person.











--
Leander Koornneef

ICS B.V.
Stadhouderslaan 57
3583 JD Utrecht

T: +31 30 63 55 730
F: +31 30 63 55 731
E: [EMAIL PROTECTED]
I: http://www.ic-s.nl

ICS biedt Service  Support, Development en Consultancy op  
uiteenlopende

internet-gerelateerde platformen, met een voorliefde voor Open Source.

Let op: mijn emailadres is gewijzigd naar: [EMAIL PROTECTED]



Re: rules_du_jour question

2006-10-29 Thread Leander Koornneef


On 29-okt-2006, at 7:38, Shaun T. Erickson wrote:


I've just downloaded this and set it up. I see there are MANY rulesets
I can choose from, but I have no idea if they are all 'safe' (not even
sure what I mean by that). Is there a subset of all these rulesets,
that everybody uses, or does everyone use all of them? How do you
decide which to use and which not to use?


If you are using spamassassin 3.1, you can use sa-update to get the SARE
rulesets from the channel provided by http://saupdates.openprotect.com/.
This negates the necessity to run rulesdujour alongside sa-update. This
channel consists only of safe rules.

Leander


Re: rules_du_jour question

2006-10-29 Thread Leander Koornneef


On 29-okt-2006, at 16:33, Shaun T. Erickson wrote:


On 10/29/06, Leander Koornneef [EMAIL PROTECTED] wrote:


If you are using spamassassin 3.1, you can use sa-update to get  
the SARE
rulesets from the channel provided by http:// 
saupdates.openprotect.com/.
This negates the necessity to run rulesdujour alongside sa-update.  
This

channel consists only of safe rules.


Ok. I've set that up and run it and now I have the standard set or
rules and the safe sare rules under /var/lib/spamassassin/3.001007.

Two questions:

Do many people use the non-sare rulesets that I see are available via
rules_du_jour (i.e., TRIPWIRE ANTIDRUG RANDOMVAL BOGUSVIRUS
ZMI_GERMAN)? Are those something I'd still likely want to get via
rules_du_jour?


In my experience, using the default sa-update channel, the openprotect
channel, auto-whitelisting, proper bayes training(!), pyzor, razor,  
dcc, SPF

and DNS blacklists wil get you a spam detection rate 99%.
Also, I generally use X-Spam-Level = 3 as the cutoff value in my  
email client

to filter spam out of my Inbox. I rarely have any false positives.



rules_du_jour restarts amavisd-new after it runs, but sa-update
doesn't. Do most people run it out of cron and simply append an
(without the quotes, of course)   /etc/init.d/amavis reload to the
command line? Or is there another, more preferred method?


sa-update indeed does not reload amavisd, because not everyone using
sa-update also runs amavis,  so you should arrange this yourself. Also,
if you are using amavis and spamassassin  3.1.5, you should read the
last section on this page: http://wiki.apache.org/spamassassin/ 
RuleUpdates

I use the script from that wiki page to run sa-update and reload amavisd
and it works fine.

Leander



Re: rules_du_jour question

2006-10-29 Thread Leander Koornneef


On 29-okt-2006, at 17:55, Shaun T. Erickson wrote:


On 10/29/06, Leander Koornneef [EMAIL PROTECTED] wrote:


In my experience, using the default sa-update channel, the  
openprotect

channel, auto-whitelisting, proper bayes training(!), pyzor, razor,
dcc, SPF
and DNS blacklists wil get you a spam detection rate 99%.


I'm doing all that, now, I think. The auto-whitelisting seems to be
happening on it's own (it does say 'auto' after all, lol), as I see
the auto-whitelist file in amavis' .spamassassin directory growing.
Likewise, I see the bayes_* files growing, as well. At some point,
when it has seen enough stuff, it will just kick in on it's own, yes?
I have a feeling that that will not be for quite some time though, as
virtually all the spam never makes it onto my system, thanks to the
postfix rules I have in place. Amavis/Clamav/Spamassassin have an easy
job here. ;) I will have to train it on spam that it misses though. I
think I saw a way to have the amavis account pull down and train on
the contents of my 'missed_spam' imap folder, via fetchmail ...


You should not only train SA with false positives and false negatives,
but also with regular streams of ham and spam. The default autolearn
threshold will for instance only train bayes with spam that scores
above 12, so feeding mails to sa-learn as spam with
required_score  score  bayes_auto_learn_threshold_spam
will also increase the overall quality of your bayesian scoring  
(somebody

please correct me if I'm wrong). For me this is easy, as my mbox files
are on the same server as SA, so I can just point sa-lean to my ham
and spam boxes. Otherwise, you may indeed need to use something
like fetchmail to pull the mailboxes from your pop/imap server.

Leander




Re: DCC worth it?

2006-10-19 Thread Leander Koornneef
In my experience (which is not statistically comfirmed), Razor  
catches more spam than DCC.
Usually if DCC hits, then Razor will probably also hit. This is not  
true the other way around:
if Razor hits, DCC regularly doesn't hit. Giampaolo's comments are  
also valid: if they both
hit, you get higher scores, which may just be enough to push a spam  
above your required_score.


Leander


On 19-okt-2006, at 10:15, Jo Rhett wrote:


John Andersen wrote:
Contemplating adding DCC to my SA config.  I already do the SURBL  
tests and Razor2.

Will I likely gain any thing via this?  Does DCC catch what other
tests miss?


DCC and Razor are very similar in approach.  DCC has recently lost  
a lot of community support due to policy decisions made by the guy  
who runs it, which is pretty much why Razor sprang into existence.


We have them in parallel on one of our work systems, and I can't  
say that DCC is better than Razor.  It catches some that Razor  
misses, but Razor seems to catch more than DCC misses. 95% of the  
time they are identical in result.


--
Jo Rhett
Network/Software Engineer
Net Consonance



--
Leander Koornneef

ICS B.V.
[EMAIL PROTECTED]
http://www.ic-s.nl

ICS biedt Service  Support, Development en Consultancy op  
uiteenlopende

internet-gerelateerde platformen, met een voorliefde voor Open Source.

Let op: mijn emailadres is gewijzigd naar: [EMAIL PROTECTED]



Re: DCC worth it?

2006-10-19 Thread Leander Koornneef
This seems to extreme to be true. I think you need to fix your DCC  
setup :-)




On 19-okt-2006, at 15:19, Coffey, Neal wrote:


John Andersen wrote:

Contemplating adding DCC to my SA config.

I already do the SURBL tests and Razor2.
Will I likely gain any thing via this?  Does DCC catch what other
tests miss?


For what it's worth, this is from seven days of logging on my  
company's

mail server:

$ zgrep RAZOR2_ spamc.log.?.gz |wc -l
   49054
$ zgrep DCC_ spamc.log.?.gz |wc -l
   0

And yes, I have DCC enabled.

$ pwd
/etc/mail/spamassassin
$ grep ^loadplugin.*DCC *
v310.pre:loadplugin Mail::SpamAssassin::Plugin::DCC

Now, granted, there might be a problem loading or running the DCC
plugin.  I haven't looked to see, yet.  I'm a little surprised that
nothing's triggered it in the last week, but Razor2 has *always* been
significantly more effective than DCC at my site, so I'm not at all
worried by it.

Incidentally, the breakdown looks like this:

Type  Total%
---
All Messages  119528   100
Spam   9816882
Spam w/Razor2  4905441

Percent of Spam w/Razor250



--
Leander Koornneef

ICS B.V.
Stadhouderslaan 57
3583 JD Utrecht

T: +31 30 63 55 730
F: +31 30 63 55 731
E: [EMAIL PROTECTED]
I: http://www.ic-s.nl

ICS biedt Service  Support, Development en Consultancy op  
uiteenlopende

internet-gerelateerde platformen, met een voorliefde voor Open Source.

Let op: mijn emailadres is gewijzigd naar: [EMAIL PROTECTED]



Re: sa-learn killed, bayes not available

2006-07-29 Thread Leander Koornneef

It looks like the process is getting killed from an external signal.
Maybe this is the Linux OOM killer in action? What is the memory/swap
status of this machine? Have you tried running sa-learn with the -D  
option?


Leander


On 29-jul-2006, at 0:31, Steven Scotten wrote:


The bayesian filter seems super-delicate. If I run sa-learn on a
mailbox with more than about 200 messages in it, it gets killed, I'm
not sure why:

$ sa-learn --spam --dir Maildir/.spam/cur/
Killed
$

If sa-learn gets killed in the middle, it leaves a database that it
thinks is empty.

Before a killed process:

debug: bayes: found bayes db version 3
debug: bayes corpus size: nspam = 592, nham = 562

After a killed process:

debug: bayes: found bayes db version 3
debug: bayes: Not available for scanning, only 0 spam(s) in Bayes  
DB  200


rescanning doesn't do any good, because sa-learn still knows about the
messages it's already looked at. I have to start training all over by
deleting bayes_seen and bayes_toks. Furthermore, this kills my
bayesian filter and Spamassassin lets through about 75% of my incoming
spam without it.

I've got thousands of spams and hams ready to feed to sa-learn, but
having to feed them 100 at a time is cumbersome and starting over
again a dozen times in the last few days

Other than backing up my .spamassassin directory before I run sa-learn
each time, are there any suggestions? I'm running 3.0.3, but it's a
hosted box so upgrading isn't my call.

Thanks,


Steve
--
Steven M. Scotten
[EMAIL PROTECTED]
The future will blow your mind





Re: sa-learn killed, bayes not available

2006-07-29 Thread Leander Koornneef

Or perhaps there is some other form of resource control in place.
What's the output of ulimit -a?

Leander

On 29-jul-2006, at 14:22, Leander Koornneef wrote:


It looks like the process is getting killed from an external signal.
Maybe this is the Linux OOM killer in action? What is the memory/swap
status of this machine? Have you tried running sa-learn with the -D  
option?


Leander


On 29-jul-2006, at 0:31, Steven Scotten wrote:


The bayesian filter seems super-delicate. If I run sa-learn on a
mailbox with more than about 200 messages in it, it gets killed, I'm
not sure why:

$ sa-learn --spam --dir Maildir/.spam/cur/
Killed
$

If sa-learn gets killed in the middle, it leaves a database that it
thinks is empty.

Before a killed process:

debug: bayes: found bayes db version 3
debug: bayes corpus size: nspam = 592, nham = 562

After a killed process:

debug: bayes: found bayes db version 3
debug: bayes: Not available for scanning, only 0 spam(s) in Bayes  
DB  200


rescanning doesn't do any good, because sa-learn still knows about  
the

messages it's already looked at. I have to start training all over by
deleting bayes_seen and bayes_toks. Furthermore, this kills my
bayesian filter and Spamassassin lets through about 75% of my  
incoming

spam without it.

I've got thousands of spams and hams ready to feed to sa-learn, but
having to feed them 100 at a time is cumbersome and starting over
again a dozen times in the last few days

Other than backing up my .spamassassin directory before I run sa- 
learn

each time, are there any suggestions? I'm running 3.0.3, but it's a
hosted box so upgrading isn't my call.

Thanks,


Steve
--
Steven M. Scotten
[EMAIL PROTECTED]
The future will blow your mind








Re: debian woody upgrade to sarge broke bayesian database

2006-06-21 Thread Leander Koornneef

Hi,

I think I also ran into this recently. The following fixed it:

[EMAIL PROTECTED]:~# aptitude install db4.2-util
[EMAIL PROTECTED]:~# db4.2_upgrade /path/to/bayes_db

Or something along those lines
You should probably make a backup of the bayes db before you blindly
copy/paste these commands :-)

Leander

On 21-jun-2006, at 11:21, Johan Loubser wrote:


The mail server with debian woody has been upgraded to sarge.
Everything seemed to work as it should but after checking a bit  
deeper I

found that the following error:

Cannot open bayes databases /home/spamd/.spamassassin/bayes_* R/O: tie
failed:

The spamassassin version is 3.0.3-2 the previus version was 3.0.2


--
Johan Loubser
(021) 8084036
Informasie Tegnologie
University of Stellenbosch