Robert - elists wrote:
You can't run the rules in score-order without driving SA's performance
into the ground.

The key here is SA doesn't run tests sequentially, it runs them in
parallel as it works its way through the body. this allows for good,
efficient use of memory cache.

By running rules in score-order, you break this, forcing SA to run
through the body multiple times, degrading performance.


Mr K

SA is an awesome, incredible product and tool.

Wonderful Job!

I am not an expert on the programming theory, design, and implementation
behind SA.

So... are you saying SA takes a single email and breaks it apart into
several pieces and scans those pieces via multiple processing threads and
comes back with an additive single end result for that single emails
multiple scan processing threads?
No, I'm saying it breaks the emails into pieces, then for the first piece, it runs all the rules. Then it runs all the rules on the second piece, and the third, and the fourth, etc.

Forcing score order causes it to run the whole message on one rule, then then whole message on the next rule, etc.

Looping through the entire message body multiple times is slow because you defeat the benefits of processor memory caches. It's much better to loop through pieces that fit in the cache.

Think of it like an assembly line. Even with one worker, assembly line methods are overall considerably faster. Think of a worker building 100 "things". That worker can get the tool he needs for the first task, do it 100 times, then get the next tool, do the next task 100 times, etc. Overall he'll finish much faster than making one thing, switching tools many times, then the next one, again switching tools many times.. etc.

I do admit that I am respectfully optimistic about your teams ability to
design code that would run just as fast if not faster with a "score order"
end result.

Maybe you could let us make that decision with local.cf knob?
Well, you can't do the score-checks with a local.cf knob. believe me, it's been tested, *YEARS* ago.. it *killed* SA's performance.

However, if you wanted to see the effects, you could use the priority setting on each rule in local.cf. You can cause them all to run in score order and see what happens...
I mean, most processors are so fast nowadays......
Wait, if they're so fast, why are you trying to optimize?

It doesn't help you to try and make things faster by hoping to bail out halfway through processing, when the cost is making SA run at less than half its normal speed.
I am thinking we would brute force it under some circumstances 'till you
folks come forth with even more brilliant design and implementation
breakthroughs.

What think?
This is a VERY, Very, Very old idea.

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=304

If it didn't work in 2002, it's not going to work any better now. And yes, there was the "run in score order" idea advocated there too. See comment #10.. Well, when we finally got arbitrary order ability, doing a strict score order and check for threshold on every rule, the performance sucked.


The better way is the current way, priority and short-circuiting. If you configure this, you can at least control the number of passes SA makes at the message, and bail out on certain trusted rules, rather than total scores.

See the docs for shortcircuit:

http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_Shortcircuit.html

Is there somewhere you recommend that we can view discussions on making
processing faster?
Archives of sa-dev, and the bugzilla..
:-)

 - rh



Reply via email to