Re: Bayes autolearn questions

David B Funk Mon, 08 Sep 2014 18:38:31 -0700

On Mon, 8 Sep 2014, Alex Regan wrote:

Did you understand that the number of previously not seen tokens has
absolutely nothing to do with auto-learning?
Yes, that was a mistake.
Did you understand that all
tokens are learned, regardless whether they have been seen before?
That doesn't really matter from a user perspective, though, right? I mean, ifthere are tokens that have already been learned are learned again, the netresult is zero.


Very much not zero. Each token has several values assocated with it:
 # ham
 # spam
 time-stamp

So each time it's learned its respective ham/spam counter is incremented
which indicates how spammy or hammy a given token is and its time-stamp is
updated indicating how "fresh" a token is. The bayes expiry process removes
"stale" tokens when it does its job to prune the database down to size.

Thus learning a token multiple times increases its weight and keeps it
"fresh" so it is kept as an active/relevant piece of info.

--
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Bayes autolearn questions

Reply via email to