On Mon, 20 Dec 2004 15:19:46 +1300, "Tony Meyer" <[EMAIL PROTECTED]> wrote:
>[Kenny] >>> The "train-on-mistakes-and-unsures" strategy implemented in >>> the Outlook addin is believed to be the most effective strategy >>> for most general users. > >[Mathew] >> Is that how the automated training is implemented in the >> latest CVS versions? > >What automated training do you mean? We don't have any automated training >(other than via command-line scripts), do we? I mean the "Start Training" button in the training tab. I always assumed that that trained on everything in the folders selected. >> I was thinking that the "train on mistakes" approach could be >> taken a step further, down to the individual token level: all >> encountered tokens are stored in the database, but only >> "activated" for filtering when found to be required to filter >> correctly; that is, when a mistake is found, tokens are >> activated in order of decreasing significance until >> classification is correct. Has anyone tried anything like this? > >This sounds reasonably similar to "train to exhaustion", which is one of the >best training methods. SpamBayes has pretty limited support for this at the >moment, but that is changing. However, it's still on a message-by-message >basis (i.e. train one message, see if that helps, train one more, see if >that helps, etc). Doing it per token would take a *long* time - it would >have to be of great benefit. I wasn't necessarily thinking of train to exhaustion - it could still be single-pass, the same as "train on mistakes". It would just be more selective when correcting for mistakes. >This is also difficult with SpamBayes specifically, because there is an >assumption that tokens come in a message 'bag'. This means it's easy to >remove messages from the database, as long as it's the whole message. >Changing token counts outside of message 'bags' would cause problems >(negative counts, etc) if the two schemes were mixed. Ah, ok. -- Mat. _______________________________________________ [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
