On Mon, 18 Jun 2018 10:13:04 -0600 @lbutlr wrote: > On 18 Jun 2018, at 08:47, RW <rwmailli...@googlemail.com> wrote: > > On Mon, 18 Jun 2018 06:13:06 -0600 > > @lbutlr wrote: > > > >> I have a script that runs when a mail is moved out of the Junk > >> folder to pass the mail through sa-learn --ham, > > > > > > Whether this is the Dovecot plugin or something local it's a poor > > way of training Bayes. You're training on SA errors not Bayes > > errors. Most imperfect Bayes results don't translate into > > misclassifications. > > I’m not sure what you’re trying too say here/ Certainly SA does > misclassify mail as spam at times, ... > Training the messages as ham is useful.
The problem is that, unless there is something badly wrong, a typical single user account wont generate enough FPs and FNs for a properly trained database. I found that Bayes's identification of ham improved until I'd trained about 1500 ham, but I wouldn't expect to get anything like 1500 SpamAssassin FPs in a lifetime. It's not even proper train-on-error because it's training on SpamAssassin misclassifications and not correcting Bayes's own errors. It allows Bayes to go uncorrected until it results in an FP or FN. You can work around the plugin's deficiencies by using autotraining or doing some additional training, but then the plugin is of limited relevance. IMO the plugin is best left to statistical filters like DSPAM.