> Alas, sb_server.py and sb_imapfilter.py don't seem to share a lot > of code > (save for using Dibbler to build the web user interface). Is that > true?
Somewhat. Unfortunately, at the time I originally wrote sb_imapfilter I didn't use IMAP myself and so when deciding whether the IMAP solution should be a 'filter' (i.e. periodically connect to an IMAP server and classify messages) or a proxy (i.e. intercept connections to the server and classify on the fly) I went with the majority vote. In many ways a proxy would be the simpler solution, and would certainly resemble sb_server a lot more (I've written an IMAP proxy for another project, based on the sb_server POP3 proxy, and there is a lot of overlap). OTOH, there are advantages (particularly training) to the filter method. > It seems the user interface, classifier bits and storage should be > essentially > identical. The user interface is nearly identical. The shared part is in UserInterface.py, with the separate subclasses in ProxyUI.py (POP3) and ImapUI.py. The majority of the code in ImapUI.py deals with presenting a list of folders from an IMAP server to the user, to select which should be scanned for messages to classify/train (this probably wouldn't be necessary with a proxy). The majority of the code in ProxyUI.py deals with the browser-based training interface (which the IMAP filter doesn't have - you just put messages in the appropriate folders on the server). The classifier and storage bits are pretty much identical (storage.py and FileCorpus.py respectively). > Any ideas on the shortest route to a core server that provides the > user, > training and storage interfaces? Start from scratch? Rip the POP3 > stuff > out of sb_server.py? Rip the IMAP stuff out of sb_imapfilter.py? I'd > really hate to reinvent the wheel since we seem to have two wheels > already. > Once that core server is available, adapting to different environments > should be possible by plugging in specific protocol adapters Definitely don't start with sb_imapfilter.py - it's basically a scanner, not an on-demand-classifier. Probably the best place to start would be with the State class in sb_server.py. There are some POP3-specific parts in there, but personally I would be happy if they were abstracted out (e.g. a State class and a POP3ProxyState subclass). I could do that (promptly ;)). What you then have are: * State.bayes (the classifier) * State.hamCorpus, State.spamCorpus, State.unknownCorpus (storage of 'messages' - untrained messages in unknownCorpus, and trained messages (expiring) in ham/spamCorpus). * Training via moving messages between corpora. Once you've got something that looks like a message, you can do something like sb_server's onRetr for classification and storage (I've cut bits that probably aren't relevant): """ msg = email.message_from_string(messageText, _class=spambayes.message.SBHeaderMessage) msg.setId(state.getNewMessageName()) # Now find the spam disposition and add the header. (prob, clues) = state.bayes.spamprob(msg.tokenize(), evidence=True) msg.addSBHeaders(prob, clues) cls = msg.GetClassification() state.RecordClassification(cls, prob) # Cache the message. Write the message into the Unknown cache. makeMessage = state.unknownCorpus.makeMessage message = makeMessage(msg.getId(), msg.as_string()) state.unknownCorpus.addMessage(message) """ For the user interface, you can just create a UserInterface.UserInterface subclass (needs a Home page/method, and an __init__ method). Actually, you probably want ProxyUI.ProxyUserInterface as-is, with a different set of options to offer in the configuration pages (the parm_ini_map and adv_map used in the __init__). (There would be a "No POP3 proxies running" message on the main page, but you could ignore that or subclass appropriately). It's a long time since I've worked with the browser interface code, but I'm pretty sure that this would give you what you want. Cheers, Tony _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev