Am 01.06.2016 um 15:39 schrieb RW:
On Tue, 31 May 2016 14:58:05 -0700 Peter Carlson wrote:# grab all the user folders users=`find /var/spool/cyrus/mail -name SPAM -print`...sa-learn --nosync --spam --progress --dir $inbox/SPAM sa-learn --nosync --ham --progress --dir $inboxI've never used Cyrus, but my understanding is that it has one directory per folder that holds both emails and metadata files. You appear to be training on both.
and even if notblindly train every inbox as ham is a road straight to hell for bayes, the same for spam in reality - how does one imagine a sane result with such a setup?
you train every false positive as spam so any futer mail is again a false positive and more and more similar mails become spammy
you train every not caught spam as ham leading in more and more mails are not caught and all trained to ham
you play lottery if the user at this moment has looked at his inbox and moved spam to the spamfolder, if he is at vacation you train als his not caught spam as ham
congratulations building such a setup, comine it with autolearning and then complain "Bayes filter marking everything as ham"
bayes training needs to be done *careful* and then you get a nearly 100% hitrate, if you train it wrong, well, you get a lottery game at best
signature.asc
Description: OpenPGP digital signature