Nick Bright schrieb am 24.07.2018 um 01:38:

So I ask: what is the best practice for learning submissions when using site-wide bayes?

From what I learnt about best practice:

- before implementing spam-learning based on user-submissions, figure out how educated your users are with identifying spam. For most users, spam is everything that is unwanted by them, but this is not the technical description of spam. If one user starts to move some newsletter into spam he originally subscribed, because he cannot figure out how to unsubscribe, while some other user still wants to get the same newsletter he will taint the bayes database. Unless all your users really realize that spam for the bayes database is only that stuff that is sent on behalf of bot-nets, by forged senders, based on million-address-containing address databases, never subscribed to, you should not rely on user submissions. Everyone gets unwanted email, but not all unwanted email should be learnt as spam to the bayes database. Many users will never understand the difference.

- forwarding spam to some magic email address isn't best practice, because you have to extract the forwarded mail to feed it into sa-learn. Many users will use the wrong forward method, regardless of what you tell them.

- best practice seem to be using imap folders. Create one "learn as spam" folder and one "learn as non-spam" folder for each user. Tell your users to move spam in to the "learn as spam" folder and to copy non-spam to the "learn as non-spam" folder. Run a script on your mail server, started regularly by cron, that visits every user's folders. It will extract every message from these folders and feed it into sa-learn.


Your imap server may support shared folders. If you use this, you need only one spam and one ham folder globally and share it between all users, but you have a privacy issue: users can see the messages put into these folders by other users. In case of ham to learn, this must not be, so shared folders are usually not the way to go. Even spam may be private to some users.

In case you use dovecot as imap server, I can tell about my learning mechanism on my system:

I use doveadm to enumerate the spam and ham folders of all users (doveadm mailbox list). Then for every spam and ham folder, I use doveadm to "read" the folders (doveadm search, then doveadm fetch). Some lines added by doveadm is cut, then it is fed to sa-learn. After successful learn, the message is purged from the folder (doveadm exunge).

I educated my users to move verified spam to the "Junk" folder (this is what Thunderbird auto-creates, as far as I remember). For ham, I told them they can copy verified non-spam into the "Junk/report-as-nonspam" folder I created for them. Copy, not move. I create both folders on account-creation.

Works reasonably good, but I only have a hand full of users on my tiny mail server.

Alex

Reply via email to