On Thu, Apr 15, 2004 at 04:26:30PM -0700, Daniel Quinlan wrote: > Sidney Markowitz <[EMAIL PROTECTED]> writes: > > > Does this make sense to people, or should we just dedicate ourselves > > to making sure that Bayes processing is so efficient that there will > > be no need to treat it as a special case? > > There are other slow rules. Language guessing, for example. > > I'd rather devote time to: > > - making code generally more efficient > - ways to make message checks more efficient in general (early exit is > one option if it actually speeds things up) > > I just had an interesting idea of how to make checks much faster. What > if we did decision tree, but only to determine whether or not all rules > would be evaluated? > > [DECISION TREE] -> definitely spam OR maybe spam > > (there is no "maybe ham" or "ham" output from the tree, so no free > pass if a spammer figures out a safe path through the tree) > > if maybe spam, then > > [PERCEPTRON] -> spam or ham > > if definitely spam, then > > no more work to do >
What do you think of associating a cost and benefit score to each rule and then you would just iterate over all the rules in order of greatest benefit for least cost until you hit the spam threshold? This may be a bit extreme since you would have to do quite bit of work tagging all the rules, but should provide a nice optimization. --eric > Daniel > > -- > Daniel Quinlan anti-spam (SpamAssassin), Linux, > http://www.pathname.com/~quinlan/ and open source consulting
