> -----Original Message-----
> From: Scot L. Harris [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, July 03, 2004 3:36 PM
> To: Spamassasin Users List
> Subject: Re: Bayes less effective over time?
> 
> On Sat, 2004-07-03 at 17:58, Jack L. Stone wrote:
> > The point about less effective is the question of "weight". As you
> know,
> > the spammer changes the form of attack constantly in an effort to get
> > around our defenses. Thus, if your "basket" is full of apples and now
> there
> > are more "oranges" being used than apples, the defeneses are weaker.
> 
> But I thought that was the point of the bayes system.  You keep teaching
> a sampling of the current spam and ham you get and it expires older
> entries as they exceed a certain time period.  Which in itself tells me
> that the spammer could then go back to their "old" tricks and get around
> the defenses until the system re-learns those tricks.  
> -- 
> Scot L. Harris
> [EMAIL PROTECTED]
> 

For me bayes just keeps getting better and better. Over the last month, I've 
seen
it go from the 3rd most often triggered rule to the 1st. 

We use IMAP and have procmail store those messages marked as Spam go into
a Spam folder for each user. The users also have a Ham folder. If they get a FP,
they move it to the Ham folder. A script runs every night and trains against 
each
person's Spam and Ham folders.

Our "magic" looks like this:

0.000          0          2              0  non-token data: bayes db version
0.000          0       8672           0  non-token data: nspam
0.000          0      63552          0  non-token data: nham
0.000          0     175517         0  non-token data: ntokens
0.000          0 1087428649     0  non-token data: oldest atime
0.000          0 1088895973     0  non-token data: newest atime
0.000          0 1088882101     0  non-token data: last journal sync atime
0.000          0 1088825438     0  non-token data: last expiry atime
0.000          0    1382400        0  non-token data: last expire atime delta
0.000          0       8023           0  non-token data: last expire reduction 
count

We are a small shop. We average about 1200 delivered messages a day. 
We reject at the sendmail level those blocklisted sites.  Our rejection rate
is a little over 34%. After that, about 6.5% of delivered mail is marked
as Spam with very very few FPs.

Of the 6.5% spam rate, bayes_99 was triggered 82% of the time and that
percentage keeps rising.

I get maybe one spam that slips through every 2-3 days.

Regarding, spammers going back to their old tricks, the appropriate
up to date rules used take care of that when they do it initially. Bayes
will restore any needed tokens that may have been deleted, with
continued training.

Spamassassin's best success lies in not depending on just rules or just bayes
to solve spamming problems. It's the combination of each that gives the
best overall performance. Especially when used with MTA level rejection.

Mike

Reply via email to