On Thu, Apr 15, 2004 at 04:26:30PM -0700, Daniel Quinlan wrote:
> Sidney Markowitz <[EMAIL PROTECTED]> writes:
> 
> > Does this make sense to people, or should we just dedicate ourselves
> > to making sure that Bayes processing is so efficient that there will
> > be no need to treat it as a special case?
> 
> There are other slow rules.  Language guessing, for example.
> 
> I'd rather devote time to:
> 
>  - making code generally more efficient
>  - ways to make message checks more efficient in general (early exit is
>    one option if it actually speeds things up)
> 
> I just had an interesting idea of how to make checks much faster.  What
> if we did decision tree, but only to determine whether or not all rules
> would be evaluated?
> 
>   [DECISION TREE] -> definitely spam OR maybe spam
> 
>     (there is no "maybe ham" or "ham" output from the tree, so no free
>     pass if a spammer figures out a safe path through the tree)
> 
>   if maybe spam, then
> 
>     [PERCEPTRON] -> spam or ham
> 
>   if definitely spam, then
> 
>     no more work to do
> 

What do you think of associating a cost and benefit score
to each rule and then you would just iterate over all the rules
in order of greatest benefit for least cost until you hit the
spam threshold?  This may be a bit extreme since you would have to
do quite bit of work tagging all the rules, but should provide a nice
optimization.

--eric



> Daniel
> 
> -- 
> Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
> http://www.pathname.com/~quinlan/    and open source consulting

Reply via email to