-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Gary,
Tuesday, August 5, 2003, 8:00:20 AM, you wrote: GF> I've been running SA with Bayes enabled only the past few days. Bayes GF> has been auto-learned on two rather large corpuses, which yielded GF> about 1100 auto-learn messages (per the Bayes journal file). I've GF> noticed the number of false negatives (ie, spam mis-classified as GF> ham) have dropped to almost zero, but I'm seeing maybe half a dozen GF> false positives (ham mis-classified as spam) per day. I'm having to GF> white list friends and newsletters that previously went through just GF> fine. Terminology confusion (possibly mine). Auto-learn is what happens with emails one by one as they come through SA. Learning from corpus would tend to be via direct (manual, not auto) sa-learn. Of those 1100 messages, how many were spam, and how many were ham? I don't think I've seen more than a half dozen FPs in any *month*, much less a day. GF> Generally, I'm using SA in local mode, and backing out to network GF> mode only when local says no ham was found. So you're running SA against your rule set and Bayes without DNSBL checks, and then if these do not scream SPAM (high score) or HAM (negative score), you then check DNSBL to see if they will give a spam score? GF> Given my ham to spam ratio (roughly 1 to 5) that's been okay, but it GF> probably leads to a surprising result where spam is over-aggressively GF> mis-classified. I'm using 2.60 cvs (6/30) at the moment, but I think GF> the same problem would come up on version 2.55. Very possibly not -- 2.60 doesn't yet have statistically determined rules; the rule set is more advanced than 2.55, and to my knowledge hasn't yet been run against the giant SA corpus available to the developers. After that process the rule score defaults are adjusted to minimize FPs. Again to my knowledge, that FP minimization step hasn't yet taken place for 2.60 GF> The problem is that I'm seeing these misclassified spams as having GF> only, or nearly only, BAYES_99 asserted. ... I don't remember ever seeing BAYES_99 on anything that wasn't spam, and I've only seen BAYES_90 on non-spam once in three months. That leads me to question the accuracy of your original corpus. How was it built and classified? What are the chances that persons A and B classified emails as spam, and Bayes learned it as spam, while persons C and D claim these are not spam? GF> Using BAYES_99 as an example, it will be scored 5.2 with Bayes GF> enabled, while running in local (non-network) mode, and only 3.008 GF> when networking is enabled. Trouble is, that 5.2 exceeds the default GF> cut off of 5. ... GF> What I'm working up to here: For those of you using Bayes, did you GF> also move your threshold value up (to say, 7 or above), or do you GF> simply tolerate more false positives? (I'd have to say that the GF> four/five false positives I'm now seeing per day, and didn't see GF> before is too high a number for my tastes). I rely heavily on Bayes. I run with a required hits of 9.0, and I run with BAYES_99 set at 9.0, and with BAYES_90 set at 7.5 (83% of threshold). I think I got one FP in all of July, and it had a low Bayes score. So in summary, no, I don't think your Bayes *scores* are the problem. I think the main problem is that Bayes learned ham as spam. I would suggest checking through your spam corpus and relearning any misclassified emails as ham. A second and less critical problem may be your use of 2.60 and its not yet statistically validated scores. This will remain less important as long as you have ham with Bayes scores 90% and over. Good luck. Bob Menschel -----BEGIN PGP SIGNATURE----- Version: PGP 8.0 iQA/AwUBPzB155ebK8E4qh1HEQLN2gCgpg1vEiUcvTJ+4HwVeuLn/XFGDz4An06f q0sQMBbXgnA0Cr+5DLVHNnyS =Wz4N -----END PGP SIGNATURE----- ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk