Matt Kettler wrote:
No, I'm saying it breaks the emails into pieces, then for the first
piece, it runs all the rules. Then it runs all the rules on the second
piece, and the third, and the fourth, etc.
Forcing score order causes it to run the whole message on one rule,
then then whole message on the next rule, etc.
Looping through the entire message body multiple times is slow because
you defeat the benefits of processor memory caches. It's much better
to loop through pieces that fit in the cache.
Think of it like an assembly line. Even with one worker, assembly line
methods are overall considerably faster. Think of a worker building
100 "things". That worker can get the tool he needs for the first
task, do it 100 times, then
Ok, I've just proven myself wrong.. However, this does mean I'm
concerned that there's other problems with how SA processes messages
that's causing it to not matter.
What I did:
as a quick, crude demonstration of how using priorities affects SA, I
went and hacked 50_scores.cf of a vanilla sa 3.2.3 into a priority.cf.
grep -P "^score [A-Z]" 50_scores.cf | sed s/score/priority/ |cut -d ' '
-f -3 | sed "s/\.//" >priority.cf
Note, 6 rules end up with no priority value, because these rules have
lots of spaces before their scores.. ie:
score RDNS_NONE 0.1
You can lint the file and fix it..
This creates a score-based priority for every rule. It's not strictly
score order, as rules scored 1.0 run at priority 10, and rules scored
1.000 run at 1000, but it's close. It's certainly close enough to create
lots of different priorities for the rules to run at.
In my testing, I grabbed a corpus file:
http://spamassassin.apache.org/publiccorpus/20021010_easy_ham.tar.bz2
And ran:
time /home/mkettler/spamassassin-3.2/masses/mass-check --file easy_ham/*
>test.out
Due to disk caching, I ran it 3 times on a vanilla 3.2.3 install. The
first run was noticeably slower than the other 2, which gained from disk
cache. I've discarded the first run.
The times I came up with were:
real 2m25.432s user 2m12.880s sys 0m11.521s
real 2m25.571s user 2m12.818s sys 0m11.672s
I then installed my priority.cf into /etc/mail/spamassassin, and re-ran..
real 2m25.212s user 2m12.507s sys 0m11.694s
real 2m25.435s user 2m12.852s sys 0m11.596s
No significant difference..
Well, it looks like I need to spend some time reading the code to study
exactly how SA runs rules, and see if it's doing something that pollutes
the memory cache, which would cause the over-sorting to not matter..
Note: by default, mass-check runs without network tests