Re: Is BAYES filtering working? Having doubts.

Bill Cole Tue, 29 Dec 2015 14:51:06 -0800

On 29 Dec 2015, at 13:24, RW wrote:

On Mon, 28 Dec 2015 23:42:03 -0500
Bill Cole wrote:

Using these facts, my learning script that runs as root and reads
from multiple real users' Maildirs does this to learn ham:

 for AFILE in $HAMS ; do formail < $AFILE ; done| sudo -H -u
$SAUSER sa-learn --ham --mbox

Where $HAMS is the list of ham message files and $SAUSER is the user
handling the system-wide BayesDB. I use formail there just to give
each message a leading 'From ' line (i.e. mbox format) so that the
whole bunch can be piped into a single sa-learn invocation.


IIRC when you do that sa-learn just creates a temporary file and then
runs on that.

Yes, with the advantage of usingMail::SpamAssassin::Util::secure_tmpfile() rather than whatever I happento roll up in a bit of Q&D shell that I never get around to reviewingfor edge cases...

The main reason to do something like that is to avoid the heavyweightsudo & load of a Perl script for each message.

The alternative without formail would be to pipe each raw messageinto
its own sa-learn.
The alternative is to give it a directory.

Sure, one can reimplement Mail::SpamAssassin::Util::secure_tmpfileand/or Mail::SpamAssassin::Util::secure_tmpdir and use that. One cancopy files from multiple user Maildirs and maybe error out beforecleaning up or maybe forget to set perms right or maybe make somemistake I haven't thought of.

Or, I could use a tool that's been at least nominally open to review formany years across many versions and which stands a strong chance ofhaving had at least one set of more competent eyes run across it lookingfor flaws to fix. I'm lazy...

It can work out for itself
whether it's maildir or just a directory of files. If you need totrain
an arbitrary  selection of files, you could symlink them into a
temporary directory.

Not if the user you want to train as can't read the real files. Symlinksdon't confer permission to read their targets (that would be very bad.)

If you run spamd it's also possible to train via
spamc.

Yes. IF you run spamd and it's how your system-wide SA filtering isdone already, that's arguably the best way to do ad hoc (re)trainingsince you can be sure it's hitting the right DB and you can feed it inparallel.

Personally I'd avoid the unforced use of mbox around Bayes without
being sure that "From-escaping" is taken account of . The problem is

that formail will replace a "From" at the beginning of a body linewith

">From" which changes the msgid hash and prevents the correct
retraining of mail that was trained without going through formail -
e.g. the correction of autotraining.

An excellent point, which I had not considered. I'm mildly surprisedthat sa-learn doesn't s/^>From /From /' each message when disassemblingthe mbox, but only mildly. It seems I've got a script to fix...

I just had a quick look and I can't see any support for this in
SpamAssassin. It's not a major problem, but in this case it's aneasily
avoidable one.

Yes. Only a small fraction of messages need the escaping at all, butit's enough to not use formail & mbox.

There's also the option of using inherited ACLs on Maildirs if they aresupported on the filesystem being used.

Re: Is BAYES filtering working? Having doubts.

Reply via email to