Charles Gregory wrote:

Hiyo!

I realize that this may run afoul of some other objectives, particularly
those where people want to know all the tests that matched, and perform
supplemental checks based on that info, but I have to wonder, could we
improve the efficiency of SpamAssassin by having it make note of the
'HITS-REQUIRED' score and have it STOP TESTING after it surpasses that
score?


IE. Is there really any reason to keep testing a piece of mail once we
know it is spam? When our threshold is somewhere between 3 and 10, and I
see mail scoring 20 or 30, I realize that this mail probably passed that
threshold less than half-way into the tests.


This could, in theory, lower our processor 'cost' for spam considerably.
Thoughts? Good idea? Stupid idea?

- Charles

It's not a stupid idea, but there are a few obstacles there:

1) Are we sure that no later running tests would pull the score back below the threshold? (whitelist, Bayes, local negative scoring rules)

2) Some of us categorize our spam by score. Really high scoring email (like 20+) gets tossed without a second glance.

3) How much time and processor savings is there really? We've already slurped the message into memory, we've already fired off the RBLs (costly on a wall-clock sense), we've presumably already run through Bayes, etc.

Personally, I think any programming effort would be better served by continuing to make spamc more robust with having multiple spamd hosts and with pushing prefs and Bayes into SQL (work being done there in the next version I believe). Toss a bunch of old web servers out there to do the scanning, and you can scale pretty high.

--Rich




Reply via email to