Reindl Harald [mailto:h.rei...@thelounge.net] wrote: >> This is a mail gateway for multiple companies. I'm not supposed to read >> e-mails on that, or picking mails that can be used for learning ham > > how did you then manage 1.4 Mio ham-samples in your biased corpus
Looks like in this amavisd-spamassassin combo, it automatically learnt a lot of ham (which weren't hams) Feb 11 03:37:31 amavis[20024]: (20024-06) spam-tag, <no-re...@maiutazas.hu> -> <someb...@company.hu>, No, score=-0.099 tagged_above=-9999 required=4 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001] autolearn=ham I never configured autolearning, I assume it came with this centos setup. Man spamassassin says, bayes_auto_learn has a default value of 1. >> Without autolearning and without the help of the end-users, I can't build a >> proper ham bayes database, can I? > surely, or don't you and people around you which can help don't send and > reveive mails? I don't want to go in this "fight", but end-users have limited IT knowledge. They are 100% outlook users (forwarding inline and attached always confuse them). If I really want this, I need something user-proof one click solutions like gmail's "spam" and "not spam" buttons which magically saves e-mails to the proper technical mailbox (which is reviewed by the admins then trained SA). With outlook users, exchange internal mta's, my options are limited. So, if I understood correctly, you all agree on that bayesian database is f***** up, let's start with a new one, autolearn turned off, and train SA from the stratch both with ham and spam mails. Best regards Szabolcs