Re: Best practice for learning submissions

Alex Woick Tue, 24 Jul 2018 04:55:54 -0700

Nick Bright schrieb am 24.07.2018 um 01:38:

So I ask: what is the best practice for learning submissions whenusing site-wide bayes?

From what I learnt about best practice:

- before implementing spam-learning based on user-submissions, figureout how educated your users are with identifying spam. For most users,spam is everything that is unwanted by them, but this is not thetechnical description of spam. If one user starts to move somenewsletter into spam he originally subscribed, because he cannot figureout how to unsubscribe, while some other user still wants to get thesame newsletter he will taint the bayes database. Unless all your usersreally realize that spam for the bayes database is only that stuff thatis sent on behalf of bot-nets, by forged senders, based onmillion-address-containing address databases, never subscribed to, youshould not rely on user submissions. Everyone gets unwanted email, butnot all unwanted email should be learnt as spam to the bayes database.Many users will never understand the difference.

- forwarding spam to some magic email address isn't best practice,because you have to extract the forwarded mail to feed it into sa-learn.Many users will use the wrong forward method, regardless of what youtell them.

- best practice seem to be using imap folders. Create one "learn asspam" folder and one "learn as non-spam" folder for each user. Tell yourusers to move spam in to the "learn as spam" folder and to copy non-spamto the "learn as non-spam" folder. Run a script on your mail server,started regularly by cron, that visits every user's folders. It willextract every message from these folders and feed it into sa-learn.

Your imap server may support shared folders. If you use this, you needonly one spam and one ham folder globally and share it between allusers, but you have a privacy issue: users can see the messages put intothese folders by other users. In case of ham to learn, this must not be,so shared folders are usually not the way to go. Even spam may beprivate to some users.

In case you use dovecot as imap server, I can tell about my learningmechanism on my system:

I use doveadm to enumerate the spam and ham folders of all users(doveadm mailbox list). Then for every spam and ham folder, I usedoveadm to "read" the folders (doveadm search, then doveadm fetch). Somelines added by doveadm is cut, then it is fed to sa-learn. Aftersuccessful learn, the message is purged from the folder (doveadm exunge).

I educated my users to move verified spam to the "Junk" folder (this iswhat Thunderbird auto-creates, as far as I remember). For ham, I toldthem they can copy verified non-spam into the "Junk/report-as-nonspam"folder I created for them. Copy, not move. I create both folders onaccount-creation.

Works reasonably good, but I only have a hand full of users on my tinymail server.


Alex

Re: Best practice for learning submissions

Reply via email to