Re: A different approach to scoring spamassassin hits

Loren Wilton Sat, 30 Jun 2007 05:07:35 -0700

You have a bit of a chicken and egg problem at the start.  Until
some learning takes place in the system.

Two possibilities. The rules exist and have scores. Assume they aremaintained, for whatever reason.

1. Until Bayes has enough info to kick in, classification is done by thescores. Then when Bayes kicks in the scores turn off (insofar as adding tothemessage score, they might still show up as tokens in the message thatBayes will process).

2. Divide all the scores by 10 or 20. The leave them on. Pretty soonbayes will override almost any reasonable score combination.

BTW, while ham rules are possible, SA has almost no ham rules; perhaps twoor so. Spammers long ago found they could write their spams to match hamrules and thus bypass SA. Thus, no ham rules, no spmammer workarounds. Ofcourse personal or ste specific ham rules will generally still work, sincethey will not be public knowledge and spammers won't be able to target them.

I suspect you can find all rule names in PerMsgStatus. However the latestSA versions have implemented a 'check' plugin that actually runs the rulesand accumulates the score. The rule running was moved to a plugin so thatpeople could, at least in theory, change the order or the way that rules arerun. It sounds like that is what you want to do, so a modified Check pluginmay well be the way to go.

I don't understand though why you are interested in the names of all rulesrun; I don't see what it buys you. Currently ALL rules are run, unlessshort-circuiting is in effect, and by default it mostly isn't. In any case,if a rule doesn't hit on a message, the name of the rule is probablyirrelevent. It might have missed because the message is ham, but it evenmore likely missed because it simply targets a different kind of spam. Soassuming that "rules not hit" === "good tokens" is unlikely to be the case.

You should be able to get Bayes to scan the rule names hit pretty easily.Bayes is just about the last rule; I think Awl comes after it. You mightwant to change that order, which I suspect you can do in the Check plugin.You could then modifty the Check code to push the rule names into a specialheader line before calling Bayes. This could probably be done in Check, andcould certainly be done by a one-off plugin that you wrote. It would becalled by a special rule just before Bayes is called, and again, it wouldadd the current rule names to a special header bayes could see.

Of course you have to modify Check to drop out the scores for the non-byesrules. Either that or rescore all of the rules.


       Loren

Re: A different approach to scoring spamassassin hits

Reply via email to