Am 29.12.2015 um 05:42 schrieb Bill Cole:
On 28 Dec 2015, at 17:54, Peter L. Berghold wrote:The script that I use to pull the messages out of a spam bucket invoking sa-learn runs as root which has permissions to read from anywhere. The complication is the amavis does not have permissions to read the Maildir files for trivial users like root does. That said, I have some thoughts as how to solve that.In case your ideas don't work out... Useful facts: sa-learn reads stdin if you don't give it any file arguments and it can take mbox format as input.
better write a script which collects the samples as root in a single folder, chown/chmod them and then call "sa-learn" with "su" as the correct non-root user
Using these facts, my learning script that runs as root and reads from multiple real users' Maildirs does this to learn ham: for AFILE in $HAMS ; do formail < $AFILE ; done| sudo -H -u $SAUSER sa-learn --ham --mbox Where $HAMS is the list of ham message files and $SAUSER is the user handling the system-wide BayesDB. I use formail there just to give each message a leading 'From ' line (i.e. mbox format) so that the whole bunch can be piped into a single sa-learn invocation. The alternative without formail would be to pipe each raw message into its own sa-learn. If you don't have sudo installed or don't like letting root use it, you can replicate the same effect with su in an uglier command line
don't get why "pipe each raw message into its own sa-learn" tried that and it's terrible slow with no usefull progress display you don't gain anything with "formail" execpt overhead sa-learn --max-size=0 --progress --spam /sample-folder/spam/ sa-learn --max-size=0 --progress --ham /sample-folder/ham/while both folders contain single eml-files which don't need to have a leading 'From' sa-learn is able to display progress including estimated time to finish
_________________________ yours:for SAMPLE_FILE in "$SA_MILTER_HOME"/training/spam/{.,}*; do /usr/bin/formail < "$SAMPLE_FILE"; done | /usr/bin/sa-learn --dbpath "$BAYES_TEMP/bayes" --max-size=0 --no-sync --progress --spam --mbox
mine for a year now:/usr/bin/sa-learn --dbpath "$BAYES_TEMP/bayes" --max-size=0 --no-sync --progress --spam "$SA_MILTER_HOME/training/spam/"
_________________________additionally there are warnings like below as well as "Learned tokens from 16670 message(s) (16670 message(s) examined)" while with my version there are all 57337 messages correctly learned
Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/HTML.pm line 260
signature.asc
Description: OpenPGP digital signature