Am 01.06.2016 um 15:39 schrieb RW:
On Tue, 31 May 2016 14:58:05 -0700
Peter Carlson wrote:


# grab all the user folders
users=`find /var/spool/cyrus/mail -name SPAM -print`
...
    sa-learn --nosync --spam --progress --dir $inbox/SPAM
    sa-learn --nosync --ham --progress --dir $inbox

I've never used Cyrus, but my understanding is that it has one directory
per folder that holds both emails and metadata files. You appear to be
training on both.

and even if not

blindly train every inbox as ham is a road straight to hell for bayes, the same for spam in reality - how does one imagine a sane result with such a setup?

you train every false positive as spam so any futer mail is again a false positive and more and more similar mails become spammy

you train every not caught spam as ham leading in more and more mails are not caught and all trained to ham

you play lottery if the user at this moment has looked at his inbox and moved spam to the spamfolder, if he is at vacation you train als his not caught spam as ham

congratulations building such a setup, comine it with autolearning and then complain "Bayes filter marking everything as ham"

bayes training needs to be done *careful* and then you get a nearly 100% hitrate, if you train it wrong, well, you get a lottery game at best

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to