CC'ing Richard Jones - Roundup guru and Reimar Bauer - MoinMoin guru. Reimar, I don't seem to have Marian Neagul's email handy. Can you forward this to him?
I've been trying (rather unsuccessfully) to figure out how to integrate a SpamBayes classifier into Roundup. Basically I know zilch about Roundup's code. You need to score form submissions (the easy part), save them for later retraining and allow misclassified submissions to be reinjected into the website (the hard parts). I had similar problems when I tried to incorporate SpamBayes into MoinMoin. These sites generally treat all submissions as valid. Presuming we have a SpamBayes training database and classifier we can talk to it's a fairly easy task to score a submission and reject it if it looks like spam. Alas, if the submission is scored as spam Roundup and MoinMoin have no convenient way to save the submission yet keep it sequestered so it doesn't turn up on the web. It occurred to me yesterday that the SpamBayes POP3 proxy and IMAP filter solve the storage and classification problems for the specific case where you're talking those two email protocols. The only trick is that they are tied to POP3 and IMAP. Instead of email I need some other way to get a "message" into and out of the classifier/database manager. Given an arbitrary form submission I should be able to convert it to a MIME message (file uploads map to attachments) and hand it off to a standalone SpamBayes server for scoring and storage. If the submission is originally marked as spam (or unsure) but is later deemed okay, I should be able to convert the MIME message back into the necessary bits for resubmission. If the submission is originally marked as ham but is later deemed to be spam the regular Roundup or MoinMoin facility for deleting tickets, pages or attachments would get rid of it. Alas, sb_server.py and sb_imapfilter.py don't seem to share a lot of code (save for using Dibbler to build the web user interface). Is that true? It seems the user interface, classifier bits and storage should be essentially identical. All that should be different between the them is the way you transmit messages to and from external systems: sink ^ | +------------------+ +----------+ | | | | | Core |<------>| Protocol | | Server | | Adapter | | | | | +------------------+ +----------+ ^ ^ | | v source web & msg storage For POP3 the source would be the email client and the sink would be the real POP3 server. For IMAP the source and sink would be the IMAP server. For websites the source and sink would be the web site (Roundup, MoinMoin, etc). The data sent from the protocol adapter to the core server would be MIME messages. The data sent to the protocol adapter would be simply score info (ham, spam, unsure, perhaps raw scores). Any ideas on the shortest route to a core server that provides the user, training and storage interfaces? Start from scratch? Rip the POP3 stuff out of sb_server.py? Rip the IMAP stuff out of sb_imapfilter.py? I'd really hate to reinvent the wheel since we seem to have two wheels already. Once that core server is available, adapting to different environments should be possible by plugging in specific protocol adapters Thx, Skip _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev