On Wed, 11 Sep 2013 18:25:59 +0200 Mathieu R. wrote: > Hello, > > Sorry for posting on both list spamassassin and dovecot : my question > is on dovecot antispam plugin, used to learn spamassassin with > sa-learn. > > I wonder if there is a way to confirme sa-learn is correctly feeded by > the antispam plugin. > ... > and here is what i got in /tmp/sa-learn-pipe.log: > > 10545-start (--spam) > 10545-end > > For me, it's working, but when i run sa-learn --backup, i just get > this : > > v 3 db_version # this must be the first line!!! > v 0 num_spam > v 0 num_nonspam > > it's probably cause i'm using ***STANDARD-ANTI-UBE-TEST-EMAIL*** wich > probably teach nothing to sa-learn,
It should still have been learned. Usually this kind of thing is due to different invocations looking for the Bayes database in different places. IIWY I'd modify the script to run sa-learn with -D bayes and have it dump stderr to a file. If you are attempting to use per unix user databases it might be useful to log $HOME as well. I'm sceptical that the Antispam plugin can learn enough ham this way. As I understand it the only mail that gets learnt as ham will be false-positives based on the overall spamassassin score, irrespective of the Bayes result. Bayes needs (by default) 200 spams and hams to even start classifying and much more for optimal results - I don't expect to get 200 FPs in the rest of my life. Unless this is high volume server with a shared database, I'd suggest either learning a few thousand hams manually, or implementing an unsure folder. You can also mitigate the problem by autotraining with a high ham threshold, but then you really need to be careful to move all spam to the spam folder.
