RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

Horváth Szabolcs Tue, 13 Feb 2018 10:11:03 -0800

Reindl Harald [mailto:h.rei...@thelounge.net] wrote:
>> This is a mail gateway for multiple companies. I'm not supposed to read 
>> e-mails on that, or picking mails that can be used for learning ham
> 
> how did you then manage 1.4 Mio ham-samples in your biased corpus


Looks like in this amavisd-spamassassin combo, it automatically learnt a lot of 
ham (which weren't hams)

Feb 11 03:37:31 amavis[20024]: (20024-06) spam-tag, <no-re...@maiutazas.hu> -> 
<someb...@company.hu>, No, score=-0.099 tagged_above=-9999 required=4 
tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, 
HTML_MESSAGE=0.001] autolearn=ham

I never configured autolearning, I assume it came with this centos setup. Man 
spamassassin says, bayes_auto_learn has a default value of 1.

>> Without autolearning and without the help of the end-users, I can't build a 
>> proper ham bayes database, can I?
> surely, or don't you and people around you which can help don't send and 
> reveive mails?

I don't want to go in this "fight", but end-users have limited IT knowledge. 
They are 100% outlook users (forwarding inline and attached always confuse 
them).
If I really want this, I need something user-proof one click solutions like 
gmail's "spam" and "not spam" buttons which magically saves e-mails to the 
proper technical mailbox (which is reviewed by the admins then trained SA).
With outlook users, exchange internal mta's, my options are limited. 

So, if I understood correctly, you all agree on that bayesian database is 
f***** up, let's start with a new one, autolearn turned off, and train SA from 
the stratch both with ham and spam mails.

Best regards
  Szabolcs

RE: Train SA with e-mails 100% proven spams and next time it should be marked as spam

Reply via email to