On 02/13/2018 11:45 AM, Horváth Szabolcs wrote:
Reindl Harald [mailto:h.rei...@thelounge.net] wrote:
I think I have no control over what is learnt automatically.
surely, don't do autolearning at all
This is a mail gateway for multiple companies. I'm not supposed to read e-mails
on that, or picking mails that can be used for learning ham.
And I can't ask users to use a "ham" mailbox, because they are not IT experts,
sometimes they have problems with a simple mail forwarding.
If you aren't allowed to check specific emails with a suspicious subject
or that are reported as spam by your users, there's no way you can do
your job of accurately filtering email.
Without autolearning and without the help of the end-users, I can't build a
proper ham bayes database, can I?
SA's autolearning doesn't use the results from BAYES_* rules since that
could make incorrect training even worse so you are going to have to
build local rules or get help from RBLs and other SA plugins to get to
the autolearning thresholds.
With non-English email flow, it's more challenging. If no RBLs hit,
then you really must train your Bayes properly which requires some way
to accurately determine the ham and spam. You must keep a copy of the
ham and spam corpi and be allowed to review suspicious email.
Can you setup a split copy of the email that can redact the recipient or
anonymize it enough to allow for review? If not, your filtering is not
going to be accurate.