Matt Kettler wrote:
No, I'm saying it breaks the emails into pieces, then for the first piece, it runs all the rules. Then it runs all the rules on the second piece, and the third, and the fourth, etc.

Forcing score order causes it to run the whole message on one rule, then then whole message on the next rule, etc.

Looping through the entire message body multiple times is slow because you defeat the benefits of processor memory caches. It's much better to loop through pieces that fit in the cache.

Think of it like an assembly line. Even with one worker, assembly line methods are overall considerably faster. Think of a worker building 100 "things". That worker can get the tool he needs for the first task, do it 100 times, then
Ok, I've just proven myself wrong.. However, this does mean I'm concerned that there's other problems with how SA processes messages that's causing it to not matter.

What I did:

as a quick, crude demonstration of how using priorities affects SA, I went and hacked 50_scores.cf of a vanilla sa 3.2.3 into a priority.cf.

grep -P "^score [A-Z]" 50_scores.cf | sed s/score/priority/ |cut -d ' ' -f -3 | sed "s/\.//" >priority.cf

Note, 6 rules end up with no priority value, because these rules have lots of spaces before their scores.. ie:
score RDNS_NONE             0.1

You can lint the file and fix it..

This creates a score-based priority for every rule. It's not strictly score order, as rules scored 1.0 run at priority 10, and rules scored 1.000 run at 1000, but it's close. It's certainly close enough to create lots of different priorities for the rules to run at.

In my testing, I grabbed a corpus file:

http://spamassassin.apache.org/publiccorpus/20021010_easy_ham.tar.bz2


And ran:

time /home/mkettler/spamassassin-3.2/masses/mass-check --file easy_ham/* >test.out

Due to disk caching, I ran it 3 times on a vanilla 3.2.3 install. The first run was noticeably slower than the other 2, which gained from disk cache. I've discarded the first run.

The times I came up with were:
real    2m25.432s  user    2m12.880s sys     0m11.521s
real    2m25.571s user    2m12.818s   sys     0m11.672s

I then installed my priority.cf into /etc/mail/spamassassin, and re-ran..

real    2m25.212s  user    2m12.507s sys     0m11.694s
real    2m25.435s  user    2m12.852s sys     0m11.596s

No significant difference..

Well, it looks like I need to spend some time reading the code to study exactly how SA runs rules, and see if it's doing something that pollutes the memory cache, which would cause the over-sorting to not matter..




Note: by default, mass-check runs without network tests


Reply via email to