> 1) Using a pickle dbm with sb_imapfilter.py is regularly > resulting in a corrupt database within days of wiping it out and starting > over. I can get about a week out of the database before it corrupts > and fails with an assertion error.
This is 1.0.1 sb_imapfilter, yes? It would be worth giving CVS sb_imapfilter a go - it should be vastly improved. I've tried to copy most bugfixes over to the 1.0.x branch, but that's not been possible when there are large changes. I also heard today that using Python 2.4 helps, which I suspect means there is a problem handling malformed messages. If using Python 2.4 is easy to do, then it would be worth doing. > 2) I've been trying to get the mysql option to work for > sb_imapfilter.py on and off for a couple months, but I am still stuck: > > First off, regardless of what iteration I try, I cannot seem > to specify any DSN other than the default. When I try to specify > a custom DSN, something happens in the code when it parses the > values so that the user field is blank, so that the result is > user '@localhost' tries to log onto mysql without success. I believe this is caused by a known bug. It's fixed in CVS for 1.1, but hasn't been backported. If you like I can do so, so that the fix is in 1.0.2. I believe you can work around it by putting a space at the start of the DSN. > Upon giving credentials to the default DSN used by the > script, I can actually get sb_imapfilter.py to train on a > sample of spam and ham successfully, but immediately afterwards, > when I try to actually run sb_imapfilter.py to filter my inbox, it > fails with the dreaded "Token seen in more spam than spam trained." > assertion error: If you have the patience, try doing this: 0. Clear the ham & spam training folders. 1. Put one (more) message in each of the ham and spam training folder. 2. Run sb_imapfilter.py -t. 3. Do a 'select * from spambayes where word="saved_state"' query against the database, and check that the values are the same as the number of messages in the folders (i.e. 1,1, then 2,2, then 3,3, ...). 4. Repeat from 1. It would help to know if it dies out quickly (like with a single message) or not. If you get to high numbers and it's still working, try adding multiple messages at a time, and see if the counts still match. I assume that sb_imapfilter always finishes without error, and isn't interrupted while training? =Tony.Meyer -- Please always include the list ([email protected]) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
