Hello Scot,

Saturday, July 3, 2004, 12:13:30 PM, you wrote:

SLH> I have seen a couple of messages recently indicating that the bayes
SLH> database becomes less effective over time.  Is this really true?

I don't think so.

SLH> I thought the idea was that by feeding spam and ham sets to it on a
SLH> regular basis that is should continue to become more effective.

That's my experience.

SLH> So how often should the entire database be purged?

There are times when the Bayes database begins to misbehave, scoring
significant ham with BAYES_99 or significant spam with BAYES_00.
Whenever that happens, for whatever reason, wipe the database and
retrain (a good reason to keep 2-3k spam and 2-3k ham around, for a
quick retrain).

Otherwise I would not purge/wipe the Bayes database.

SLH> Are there issues with the ratio of ham spam in the database being
SLH> to far out of equilibrium?

Probably, but "too far" is rather extreme.  I've been feeding three
Bayes databases identical training for months now, with a spam ratio
of about 90%. Had to wipe/retrain one of those three for reasons I
don't understand, but the other two continue to work marvelously well.

Saturday, July 3, 2004, 4:33:20 PM, Nick Leverton <[EMAIL PROTECTED]> wrote:

NL> For me at least, the number of different tokens in mail is much lower than
NL> the number of spam signs spammers can put into their mail !  I don't have
NL> numbers for this, but I have found it very hard to train a spam when the
NL> system had also trained positively on numerous other undetected spams
NL> with the same characteristics (especially but not exclusively DSNs).
NL> And even with an autolearn_spam_threshold of 0.1, still much spam from
NL> new sources with new tokens will be learnt as ham !

I auto-learn ham at -2.0 ... I therefore auto-learn lots of spam, and
much less ham, and it's almost impossible for spam to get auto-learned
as ham here.

Bob Menschel

Reply via email to