Re: YO DEVELOPERS! Efficiency idea

Rich Puhek 17 Mar 2004 22:05:37 -0000

Charles Gregory wrote:

Hiyo!
I realize that this may run afoul of some other objectives, particularly those where people want to know all the tests that matched, and perform supplemental checks based on that info, but I have to wonder, could we improve the efficiency of SpamAssassin by having it make note of the 'HITS-REQUIRED' score and have it STOP TESTING after it surpasses that score?

IE. Is there really any reason to keep testing a piece of mail once we know it is spam? When our threshold is somewhere between 3 and 10, and I see mail scoring 20 or 30, I realize that this mail probably passed that threshold less than half-way into the tests.
This could, in theory, lower our processor 'cost' for spam considerably.
Thoughts? Good idea? Stupid idea?
- Charles


It's not a stupid idea, but there are a few obstacles there:

1) Are we sure that no later running tests would pull the score back below the threshold? (whitelist, Bayes, local negative scoring rules)

2) Some of us categorize our spam by score. Really high scoring email (like 20+) gets tossed without a second glance.

3) How much time and processor savings is there really? We've already slurped the message into memory, we've already fired off the RBLs (costly on a wall-clock sense), we've presumably already run through Bayes, etc.

Personally, I think any programming effort would be better served by continuing to make spamc more robust with having multiple spamd hosts and with pushing prefs and Bayes into SQL (work being done there in the next version I believe). Toss a bunch of old web servers out there to do the scanning, and you can scale pretty high.

--Rich

Re: YO DEVELOPERS! Efficiency idea

Reply via email to