> -----Original Message-----
> From: Ted Mittelstaedt [mailto:t...@ipinc.net]
> Sent: 2009-10-10 02:40
> To: Marc Perkel
> Cc: users@spamassassin.apache.org
> Subject: Re: SA needs a new paradigm for rule structure
> 
> 
> Marc Perkel wrote:
> > I've brought this idea up over the years but I'll try to 
> explain it in a 
> > different way. Maybe we can do this with a lot of meta rules.
> > 
> > What we need are rules that combine a lot of simple rules 
> into concepts 
> > and then combine those rules into rules that score - and 
> score big. As 
> > an example, lets take a standard nigerian scam email.
> > 
> >  From <> reply to:
> > 
> > [I don't know you] Dear stranger, I am mr, ms. mrs. my name is
> > 
> > [I am connected] I am a soldier in Iraq, I and the daughter of an 
> > african president, I work at a bank in hong hong
> > 
> > [I have money] I have the sum of 56 million dollars USD
> > 
> > [the money is hot] no beneficiaries, sneak it out of the country, 
> > oppressive regime
> > 
> > [transfer to your account] splitting the funds, wire to your account
> > 
> > [i need you information] name, address, account number
> > 
> > [i want you to contact me] by email, phone
> > 
> > [keep this a secret] confidential discretion
> > 
> > So - we create a lot of simple rules with no points with 
> key words and 
> > phases and then combine these rules using meta rules to get these 
> > concepts. That way we have a meta rule like, "they don't 
> know me" "that 
> > are talking about transferring millions" "they want my information" 
> > "they are talking about hot money". Then you combine those 
> concepts into 
> > rules that can definitively determine it is spam.
> > 
> > And - I am still looking for someone who might do baysian 
> or some other 
> > automatic system that looks for rule combinations and 
> increases scores 
> > based on that.
> > 
> 
> I know that it seems like the idea of building up "meta" rules with
> a lot of small rules will give you a more accurate hit rate, but
> this is one of those non-intuitive things that can be shown by
> statistical mathmatics, that is that the concept won't work.  Or
> rather, it won't work any better than the existing paradigm.
> 
> In other words, the current system of assigning little points to
> a lot of little rules will yield the same result for any given
> set of spam messages as organizing all
> these small rules into groups that have bigger point values.
> 
> The only thing the organization does is for humans to understand
> what is going on better.  This is because how humans think about
> math like statistics is a lot different than how a computer
> works with mathematics like statistics.
> 
> Ted

I thought I remembered a few years back that Baysian chains had a 10%
increase in capture rate over straight Bayes rules. I would think that this
is similar. 

The problem with meta rules is that they can be fooled by a single change.
Hit 4 out of 5 and you don't get the 7.0 score because the spammer changed
one single thing. But with single rules at least those 4 things would have
scored. 

You would need to constantly tweak the meta rules. 

I like the idea, and have thought on it before. I understand Ted's point on
the statistics. I think it can be made better, but not with current SA code.
And I know the old quote from JM, "All code samples are always welcome." :-)
So I hope to one day get something written to try. 

--Chris 

Reply via email to