[Kenny]
> Another reason is that you have to be very diligent about
> checking for false positives. If one good message is
> incorrectly classified as spam and automatically trained
> as such, it can negatively affect SpamBayes's ability to
> properly identify other good messages later.

BTW, Alex Popiel did some tests with the incremental testing setup using
this system (called 'corrected' in regimes.py).  His initial findings were:

"""
4. Training immediately based on the classifier output and
   making corrections to perfect at the end of the day is
   only marginally worse than immediately perfect training.
"""

<http://cashew.wolfskeep.com/~popiel/spambayes/incremental/>

I thought I recalled someone doing testing without the correction as well,
but I can't find anything right now.

IAC, since mistake-based training or nonedge training (or train to
exhaustion) all (typically, for what that's worth) give better results than
'perfect' (train on everything) training, which does better than this, it's
probably not the scheme to go for :)

=Tony.Meyer

-- 
Please always include the list ([EMAIL PROTECTED]) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.

_______________________________________________
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to