Am 29.12.2015 um 05:42 schrieb Bill Cole:
On 28 Dec 2015, at 17:54, Peter L. Berghold wrote:

The script that I use to pull the messages out of a
spam bucket invoking sa-learn runs as root which has permissions to read
from anywhere.  The complication is the amavis does not have permissions
to read the Maildir files for trivial users like root does.

That said, I have some thoughts as how to solve that.

In case your ideas don't work out...

Useful facts: sa-learn reads stdin if you don't give it any file
arguments and it can take mbox format as input.

better write a script which collects the samples as root in a single folder, chown/chmod them and then call "sa-learn" with "su" as the correct non-root user

Using these facts, my learning script that runs as root and reads from
multiple real users' Maildirs does this to learn ham:

   for AFILE in $HAMS ; do formail < $AFILE ; done| sudo -H -u $SAUSER
sa-learn --ham --mbox

Where $HAMS is the list of ham message files and $SAUSER is the user
handling the system-wide BayesDB. I use formail there just to give each
message a leading 'From ' line (i.e. mbox format) so that the whole
bunch can be piped into a single sa-learn invocation. The alternative
without formail would be to pipe each raw message into its own sa-learn.
 If you don't have sudo installed or don't like letting root use it,
you can replicate the same effect with su in an uglier command line

don't get why "pipe each raw message into its own sa-learn"

tried that and it's terrible slow with no usefull progress display
you don't gain anything with "formail" execpt overhead

sa-learn --max-size=0 --progress --spam /sample-folder/spam/
sa-learn --max-size=0 --progress --ham  /sample-folder/ham/

while both folders contain single eml-files which don't need to have a leading 'From' sa-learn is able to display progress including estimated time to finish
_________________________

yours:
for SAMPLE_FILE in "$SA_MILTER_HOME"/training/spam/{.,}*; do /usr/bin/formail < "$SAMPLE_FILE"; done | /usr/bin/sa-learn --dbpath "$BAYES_TEMP/bayes" --max-size=0 --no-sync --progress --spam --mbox

mine for a year now:
/usr/bin/sa-learn --dbpath "$BAYES_TEMP/bayes" --max-size=0 --no-sync --progress --spam "$SA_MILTER_HOME/training/spam/"
_________________________

additionally there are warnings like below as well as "Learned tokens from 16670 message(s) (16670 message(s) examined)" while with my version there are all 57337 messages correctly learned

Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/HTML.pm line 260


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to