Hello Scot, Saturday, July 3, 2004, 12:13:30 PM, you wrote:
SLH> I have seen a couple of messages recently indicating that the bayes SLH> database becomes less effective over time. Is this really true? I don't think so. SLH> I thought the idea was that by feeding spam and ham sets to it on a SLH> regular basis that is should continue to become more effective. That's my experience. SLH> So how often should the entire database be purged? There are times when the Bayes database begins to misbehave, scoring significant ham with BAYES_99 or significant spam with BAYES_00. Whenever that happens, for whatever reason, wipe the database and retrain (a good reason to keep 2-3k spam and 2-3k ham around, for a quick retrain). Otherwise I would not purge/wipe the Bayes database. SLH> Are there issues with the ratio of ham spam in the database being SLH> to far out of equilibrium? Probably, but "too far" is rather extreme. I've been feeding three Bayes databases identical training for months now, with a spam ratio of about 90%. Had to wipe/retrain one of those three for reasons I don't understand, but the other two continue to work marvelously well. Saturday, July 3, 2004, 4:33:20 PM, Nick Leverton <[EMAIL PROTECTED]> wrote: NL> For me at least, the number of different tokens in mail is much lower than NL> the number of spam signs spammers can put into their mail ! I don't have NL> numbers for this, but I have found it very hard to train a spam when the NL> system had also trained positively on numerous other undetected spams NL> with the same characteristics (especially but not exclusively DSNs). NL> And even with an autolearn_spam_threshold of 0.1, still much spam from NL> new sources with new tokens will be learnt as ham ! I auto-learn ham at -2.0 ... I therefore auto-learn lots of spam, and much less ham, and it's almost impossible for spam to get auto-learned as ham here. Bob Menschel
