Re: Checking if sa-learn is actually learning
On Fri, 16 Oct 2015 20:59:52 -0500 Ryan Coleman wrote: > How do I go about checking that my automated scripts that handle spam > learning are actually learning? I have literally hundreds of emails a > day that go into the ?new? folder I have set up and it does not seem > to be learning from them. > ... > > sa-learn commands: > [scans domains for specified folders and scans them] > > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.New*' -type d > > -exec /usr/bin/sa-learn --no-sync --spam --progress {}* > > \; /usr/bin/find /var/mail/vhosts/ -name '*.Spam.Suspected*' -type > > d -exec /usr/bin/sa-learn --no-sync --spam --progress {}* \; There are a few thing wrong with this. The * in {}* is at very best superfluous and may be causing various possible problems. It wouldn't work at all with a POSIX compliant shell. Also, for a maildir folder foo you are running sa-learn separately on foo/, foo/cur, foo/new and foo/tmp. sa-learn understands maildir so training on new & cur involves unnecessary parsing and invocations of sa-learn. You shouldn't be training on tmp at all because you might get an incomplete email. Also I don't see anything about learning ham. One you've fixed your script append the following: sa-learn -D bayes --dump magic >> /var/tmp/sa-debug 2>&1 and then let the script run as it would do normally do, from cron or whatever. When you look at the output file, check nspam is increasing as new spam is trained and that nspam and nham are both over 200. Then check that delivery and training are using the same database. Look at the location of the bayes files in the debug. Take a look at the mtime of the bayes journal file in the same directory, and check that it's updated during a mail delivery scan.
Re: Checking if sa-learn is actually learning
On 2015-10-16 20:59 -0500, Ryan Coleman wrote: > sa-learn commands: > [scans domains for specified folders and scans them] > > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.New*' -type d -exec > > /usr/bin/sa-learn --no-sync --spam --progress {}* \; > > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.Suspected*' -type d -exec > > /usr/bin/sa-learn --no-sync --spam --progress {}* \; > > I swear I had issues in the past without having —no-sync, but is that causing > it? If you do the routine learning with --no-sync, you must have one run with --sync as well, maybe in a cron job. Or just run with --sync once at the end of this same script. That much is straightforward, and should be clear from the man/pod pages. The part that caused me some trouble, and is somewhat underdocumented IMO, is the interaction of --sync with --force-expire. I'm afraid I can't help you with that because I took the extreme step of disabling expiration, and instead re-creating a fresh database monthly from the recent corpus which I keep around. -- Please *no* private copies of mailing list or newsgroup messages. Rule 420: All persons more than eight miles high to leave the court.
Checking if sa-learn is actually learning
How do I go about checking that my automated scripts that handle spam learning are actually learning? I have literally hundreds of emails a day that go into the “new” folder I have set up and it does not seem to be learning from them. OS: Ubuntu 14.04.3 LTS MTA: Postfix 2.11.0-1ubuntu1 postgrey 1.34-12 spamassassin/spamc 3.4.0-1ubuntu2.1 sa-learn commands: [scans domains for specified folders and scans them] > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.New*' -type d -exec > /usr/bin/sa-learn --no-sync --spam --progress {}* \; > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.Suspected*' -type d -exec > /usr/bin/sa-learn --no-sync --spam --progress {}* \; I swear I had issues in the past without having —no-sync, but is that causing it?