Hello Tom,

Friday, June 13, 2003, 9:38:10 AM, you wrote:

TM> I'm kind of confused here.  The way I see it (which could very well
TM> be a misunderstanding, mind you) is that the reason it autolearns
TM> spam over 15 points by default is to make darned sure that it doesn't
TM> learn a false positive.


TM> Then one would augment its learning by feeding missed spams through
TM> sa-learn. 

That's exactly what I do.

TM>  The only reason I can think of to NOT feed low-scoring spams through
TM> sa-learn is that I've decided that a spam that scores 5.x points has
TM> no interesting tokens.  Quite the opposite is true; that's why we
TM> feed it with a corpus of known spam in the first place, rather than
TM> feeding it a corpus of known spam that has been run through
TM> spamassassin manually and the under-15 spams weeded out.

Agreed.  Once Bayes has enough experience with our email, that
non-autolearn email which Bayes gives an 80%+ probability to probably
doesn't need to be manually learned as spam, unless it falls at/under the
required threshold and we need to raise that spam's score. However, I
believe that feeding spam which scores 70% or less into Bayes, confirming
to it that yes, this is spam, helps Bayes do its job better.

TM> Same goes with hand-feeding hams that score 4.x points, in the theory
TM> that there's a fixed probability that a ham from that source will at
TM> some point trigger another test and trip it over the threshold.

With recognition that the point at which we choose to hand-feed hams can
vary.  With my required level at 9, I'm not overly concerned with hams
that score 4.x -- they're not significantly near the spam threshold yet.
I pay more attention to their Bayes score (if Bayes can't judge whether a
single email is spam/ham, then I will likely teach it that the email is
ham) and the uniqueness or source of the email (if it's the type of ham I
suspect Bayes hasn't seen, I'll feed it; if it's from some source I want
all emails to be seen as ham, I'll feed it).

TM> Perhaps I misunderstand.  If so, I'd appreciate alternate viewpoints
TM> and discussion.

Mine is not a developer's knowledge -- just speaking from personal
experience of only a month or so.

Bob Menschel

