It's been over a decade since I last used (or needed) Spambayes, but I have good memories of it and I really liked it a lot.
I'm currently working on an idea where I think Spambayes, or a Spambayes-like approach, may come in help. The software I'm working with, is a local Outlook client, whatever the current version is now in Office 365, and an on-premise Exchange server. The OS is Windows 10. The mailbox is a shared mailbox, not a local PST file. There is a mail folder with 10k emails. Almost all emails have been manually categorized and labelled: All emails are actually a "container" email, with the original email as a .msg attachment. As if you were doing "forward as attachment", so that all original email headers are preserved. Additionally there is a second attachment, headers.txt, which contains the email headers of the original email. The emails are labelled thus: * Phishing (3k) -> these are emails with a direct security threat, like password stealing * Spam (2k) --> typical junk mail that is not a direct security threat * Graymail (3k) --> newsletters, mails from sales people, invitations for conferences... all somewhat relevant for our industry, but recipients just aren't interested. This is "it's not actually spam because I subscribed a long time ago and now I am too lazy to unsubscribe" * False positive (0.5k) --> emails that were mistakenly reported as spam * Uncategorized (1.5k) --> these emails have not yet been manually reviewed I know that Spambayes works with just two buckets: spam and not-spam. Given the number of manually categorized emails I already have, how feasible would it be to write something similar but with 4 buckets, and to have the emails as training data? I am not concerned with 100% accuracy, even 80% is good enough. Maybe I could use 4 separate databases instead of just one? Also good to know: I haven't written anything more than Hello World in Python, but I'm not afraid to learn. The machine I'm working on also doesn't have any development tools and I have no permission to install Python. I do have another machine where I can do whatever. It is Windows 11, also has Office, but because of security reasons it is not allowed to access that Exchange mailbox. I guess I could export the folder to a PST and copy that over, but that wouldn't be allowed either - not technically, but because of policy reasons. (PII and such) Please let me pick your brains! If anything comes from it, I'll post my code on GitHub. -- Met vriendelijke groeten / Kind regards / Med vänliga hälsningar Amedee Van Gasse ame...@vangasse.eu amedee.be - in/amedee <https://linkedin.com/in/amedee> +32 485 805 674
_______________________________________________ spambayes-dev mailing list spambayes-dev@python.org https://mail.python.org/mailman/listinfo/spambayes-dev