Hello FH, Wednesday, February 16, 2005, 7:31:57 AM, you wrote:
>> What were the rule hit changes? Depending on the time between the >> first scan and the second, some of that might have been due to network >> tests having been taught the spam. The more time that passed, the more >> likely such a score increase would be. Bayes also could have been >> involved, since other emails could have increased the Bayes score. F> It was less than 1/2 hour because I was experimenting w/ the F> commands and the new email came in so I decided to use that one ;) A half hour is more than enough time for these changes: F> Initial email: F> X-Spam-Status: No, score=1.9 required=4.0 tests=BAYES_99 autolearn=no F> version=3.0.2 Only bayes hit. Well trained Bayes database. If you're confident about your Bayes training, you might want to raise that score closer to your spam threshold (I set BAYES_99 equal to my spam threshold). F> After running it through the spamassassin -D command: F> X-Spam-Status: Yes, score=12.5 required=4.0 tests=BAYES_99, F> RCVD_IN_BL_SPAMCOP_NET,URIBL_AB_SURBL,URIBL_OB_SURBL,URIBL_SC_SURBL, F> URIBL_WS_SURBL autolearn=no version=3.0.2 F> BTW isn't the default autolearn spam threshold supposed to be 12? Yes, but the autolearn test ignores the BAYES_* score of 1.9, and since your non-Bayes score was 10.6, it didn't qualify for auto-learn. The additional tests that hit were all network tests, and network tests that have a history of reacting fairly quickly to new spam. During your 1/2 hour, those remote systems learned about the spam, so when you repeated your test, they now knew this was spam. F> Email after bounce: F> X-Spam-Status: No, score=0.4 required=4.0 tests=BAYES_60 autolearn=ham F> version=3.0.2 F> spamassassin -D after bounce: F> X-Spam-Status: Yes, score=7.9 required=4.0 tests=ALL_TRUSTED,BAYES_99, F> URIBL_AB_SURBL,URIBL_OB_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL F> autolearn=no version=3.0.2 ALL_TRUSTED fired because your bounce came from within your trusted network. That lowered the score. I don't understand why that "Email after bounce" run didn't include the network tests that appeared in both of your other "not the first" test runs. F> Did I miss a switch somewhere since there seems to be more tests F> running/reported when I run it manually instead of when it runs through the F> system? Nope. Instead I'd say your installation is AOK, and the network tests are doing exactly what they're supposed to do. So, the question now becomes, how many of these false negatives are sneaking through your system? When my system was running at its best, I had a 0.2% false negative rate (ie: 99.8% of all emails were scored and tagged correctly). With no custom rules, if you're running close to 95% correctly flagging your emails, I'd say you're doing AOK. If that's the case, then you're ready to determine which custom rules files you want to add, to start squeezing out those last few percentage points. >> Auto-learning it as ham is IMO a problem. I think that auto-learning >> anything with a positive score as ham is asking for trouble. I have my >> ham auto-learn thresholds set at -2. (I have several negative scoring >> rules specific to my domains.) F> That's the bayes_auto_learn_threshold_nonspam right? I don't have it set in F> local.cf so I thought it would have been the default (.1 right?). I'm not F> sure why the .4 above was autolearned as ham?!? I just ran spamassassin F> --lint -D but didn't see a report of the threshold, should it have been in F> there? Correct. And again, since Bayes scores are ignored during the auto-learn determination, your 0.4 was probably a non-bayes score < 0. >> By ~root/.spamassassin, do you mean each individual's root or home >> directory, then a .spamassassin directory under that? And in your >> config files, do you specify a Bayes database path? F> Nope, just the root user has that dir/files. I didn't turn on the individual F> user preferences (I read a couple of places this wasn't recommended). From F> the /etc/mail/spamassassin/local.cf file: "bayes_path F> /var/spool/spamassassin/sa". Good setting. Actually, individual user preferences /is/ IMO recommended (allow users to tailor their scores, move their spam thresholds up and down, and even blacklist/whitelist specific senders). What's NOT recommended is allowing users to create their own rules, for security and performance reasons. F> Is it time to throw the reset switch and start from scratch? Nope. I'd say your setup looks good from here. So now it's time to stop worrying about single spam sneaking through (I don't know anyone who has managed a 100% accuracy score), and instead see whether your accuracy rate is high enough to be satisfactory, and what you can do to improve it without spending too much in time and resources. Bob Menschel