On Sun, 25 Sep 2011 09:28:32 -0700
Marc Perkel wrote:

> Here's what I'd like to be able to do. I'd like a program of some
> sort where I could take word tokes - like name of rules that were
> triggered - and look for rule combinations that indicate spam or ham.
> For example, a message triggers 4 rules A B C and D. These rules are
> combined as follows:
> 
> A
> ...
> ABCD
> 
> Each rule combo is then looked up for how often it occurs in spam and 
> how often it occurs in ham. Then the results are combined into some
> sort of likelihood of being spam or ham.
> 

There are a couple of problems with this. The first is that most SA
rules are either neutral or strong spam indicators, which make them
unsuitable for the sort of techniques used in Bayes. 

The second is that most of the scope for meaningful combinations is in
high-scoring spam. Low-scoring spams are low-scoring because SA couldn't
find much evidence - in these you're going to end-up with
meaningless strong+neutral combinations like BAYES_99+SPF_PASS.   

That's not to say that it can't be done in a more general sense; the
scoring system is a way  of converting rule combinations into a
classification.

Similar questions have been asked before, IIRC someone came-up with
an alternative way of getting a classification from the rule hits
based on learning, and made a basic plugin that tweaked the score
accordingly.

Reply via email to