> However, I don't seem to have experienced any of the problems that > others have found, so I'm happy about that!
That's because we've fixed all other problems <wink>. > Firstly, people tend to delete the junk email folder, and this > kills SB > cold. Can it be tweaked a bit to just recreate the folder if it's > missing, or does the design prevent this? If it's theoretically > possible then I can have a go, since it's my itch, but I don't want to > setup the python 222 environment and so forth if it's known that this > won't work. There's no reason I can think of that this wouldn't work. (I'm not sure what you mean by "python 222" - if you mean version 2.2.2 of Python, note that Python 2.2 (actually 2.2.0) is the *minimum* requirement, and the recommended version is Python 2.4.2). If you have problems with this, feel free to ask for help - probably the [email protected] list would be a better place (a message with "missing junk e-mail folder" as the subject probably gets written off by many as someone having deleted their junk folder and needing to be pointed to the FAQ). > Secondly, I deploy SB via a repackaged MSI, and this works well, > but I'd > really like to send out a partially pretrained package. Has anyone > done > any work in this area before? I believe Skip Montanaro considered this quite some time ago (looking into which messages should be included, etc), and I think that InBoxer, which was originally based on the SpamBayes Outlook plug-in, includes an optional pretrained database. I'm not aware of any others. Really, I would advise against this. It only takes a very small number (less than a day's email for most users) for SpamBayes to become effective. By doing all the training yourself, you're (a) learning how to do ongoing training, which is important, and (b) learning *your* email stream, not a general one. (It's possible that (b) isn't as much of a problem in your case, since corporate (or government) email might all look alike (from a statistical point of view) anyway). If you do want to do it, the only hard job is picking which email to use. Once you've done that, put it in Ham/Spam folders, and train away. Then just include the default_bayes_database.db and default_message_database.db in the installation, putting them into the user's data directory. If you want the training to be as effective as possible, you might want to switch off some of the tokenizing options to avoid getting spurious clues that relate specifically to whoever actually received the email. The comments in tokenizer.py outline which ones should be avoided ("mixed corpora" is the phrase to watch out for). Ask if you want more help with this. =Tony.Meyer -- Please always include the list (spambayes at python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
