Re: more efficent big scoring

Matt Kettler Sat, 19 Jan 2008 09:04:59 -0800

Robert - elists wrote:

You can't run the rules in score-order without driving SA's performance
into the ground.


The key here is SA doesn't run tests sequentially, it runs them in
parallel as it works its way through the body. this allows for good,
efficient use of memory cache.

By running rules in score-order, you break this, forcing SA to run
through the body multiple times, degrading performance.


Mr K

SA is an awesome, incredible product and tool.

Wonderful Job!

I am not an expert on the programming theory, design, and implementation
behind SA.

So... are you saying SA takes a single email and breaks it apart into
several pieces and scans those pieces via multiple processing threads and
comes back with an additive single end result for that single emails
multiple scan processing threads?

No, I'm saying it breaks the emails into pieces, then for the firstpiece, it runs all the rules. Then it runs all the rules on the secondpiece, and the third, and the fourth, etc.

Forcing score order causes it to run the whole message on one rule, thenthen whole message on the next rule, etc.

Looping through the entire message body multiple times is slow becauseyou defeat the benefits of processor memory caches. It's much better toloop through pieces that fit in the cache.

Think of it like an assembly line. Even with one worker, assembly linemethods are overall considerably faster. Think of a worker building 100"things". That worker can get the tool he needs for the first task, doit 100 times, then get the next tool, do the next task 100 times, etc.Overall he'll finish much faster than making one thing, switching toolsmany times, then the next one, again switching tools many times.. etc.

I do admit that I am respectfully optimistic about your teams ability to
design code that would run just as fast if not faster with a "score order"
end result.

Maybe you could let us make that decision with local.cf knob?

Well, you can't do the score-checks with a local.cf knob. believe me,it's been tested, *YEARS* ago.. it *killed* SA's performance.

However, if you wanted to see the effects, you could use the prioritysetting on each rule in local.cf. You can cause them all to run in scoreorder and see what happens...

I mean, most processors are so fast nowadays......

Wait, if they're so fast, why are you trying to optimize?

It doesn't help you to try and make things faster by hoping to bail outhalfway through processing, when the cost is making SA run at less thanhalf its normal speed.

I am thinking we would brute force it under some circumstances 'till you
folks come forth with even more brilliant design and implementation
breakthroughs.

What think?

This is a VERY, Very, Very old idea.

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=304

If it didn't work in 2002, it's not going to work any better now. Andyes, there was the "run in score order" idea advocated there too. Seecomment #10.. Well, when we finally got arbitrary order ability, doing astrict score order and check for threshold on every rule, theperformance sucked.

The better way is the current way, priority and short-circuiting. If youconfigure this, you can at least control the number of passes SA makesat the message, and bail out on certain trusted rules, rather than totalscores.


See the docs for shortcircuit:

http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_Shortcircuit.html

Is there somewhere you recommend that we can view discussions on making
processing faster?

Archives of sa-dev, and the bugzilla..

:-)

 - rh

Re: more efficent big scoring

Reply via email to