> First We should create a mathematical criteria of rule quality and
> effectivly.
> 
> (I suppose that this criteria reject/commit new rules and remove old rules)
> 
> The first and main criteria is ham/spam ratio for whitelist rules
> (score<0) and spam/ham ratio for blacklist rules (score>0).

We already have criteria.

We use the S/O ratio (spam/overall = spam/(ham+spam) using a 50/50
weighting of ham to spam so the weighting is constant).  High is good
for spam rules.  Low is good for ham rules.

We also use a RANK number which is a relative ranking system of each
rule compared to every other rule.

We also use the hit rate.  SPAM% for spam rules and HAM% for ham rules.

And also we use overlap (or correlation) of rules to eliminate rules
that overlap with other rules too much.

At the end of the day, however, the only thing that matters is the score
generated by the perceptron.  It does a better job than other simple
measures of setting scores because interactions between rules are too
complicated to represent with simple formulas.

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Reply via email to