[Kenny]
>> The "train-on-mistakes-and-unsures" strategy implemented in 
>> the Outlook addin is believed to be the most effective strategy
>> for most general users.

[Mathew]
> Is that how the automated training is implemented in the 
> latest CVS versions?

What automated training do you mean?  We don't have any automated training
(other than via command-line scripts), do we?

> Or are you talking about manual 
> training, starting with an empty database and correcting any 
> mistakes as new messages arrive?

I believe this is what Kenny was referring to.

> I was thinking that the "train on mistakes" approach could be 
> taken a step further, down to the individual token level: all 
> encountered tokens are stored in the database, but only 
> "activated" for filtering when found to be required to filter 
> correctly; that is, when a mistake is found, tokens are 
> activated in order of decreasing significance until 
> classification is correct. Has anyone tried anything like this?

This sounds reasonably similar to "train to exhaustion", which is one of the
best training methods.  SpamBayes has pretty limited support for this at the
moment, but that is changing.  However, it's still on a message-by-message
basis (i.e. train one message, see if that helps, train one more, see if
that helps, etc).  Doing it per token would take a *long* time - it would
have to be of great benefit.

This is also difficult with SpamBayes specifically, because there is an
assumption that tokens come in a message 'bag'.  This means it's easy to
remove messages from the database, as long as it's the whole message.
Changing token counts outside of message 'bags' would cause problems
(negative counts, etc) if the two schemes were mixed.

=Tony.Meyer

-- 
Please always include the list ([EMAIL PROTECTED]) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.

_______________________________________________
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to