> OK, I went to re-train from scratch. I removed hammie.db, > message_info_database.db, and statistics_database.db from > Documents and Settings/Owner/Application Data/SpamBayes/Proxy. > I was going along fine all day, training new messages, then I > went to review some additional messages (by right clicking on > the tray icon). It pulled up a ton more messages than > I was expecting, so I discarded all except 8 of them. Then I > went to the home page and it says: > Database only has 7 good and 1 spam - you should consider performing > additional training. > > Apparently the reason it pulled up a ton of messages, was > because all of a sudden it decided that it hadn't trained on > them already, even though it had.
Were they definitely the same messages? That wouldn't fit with my suspicions (below). > So the question is, what > did I do before the error occurred, that might have caused > spambayes to suddenly not remember any previous training. > The answer -- the only thing i did was to modify the > configuration, so it would put the string "spam," in the "To:" > and "Subject:" headers. > > So is modifying the configuration supposed to undo all the > prior training? No. > If not, any guesses on why this happened? There is a known (fixed in CVS) bug with 1.0.1 that means that if you make any configuration changes via the web interface then many options are reset to their default values until the next time you restart SpamBayes. This has had two effects reported so far: in the first, the columns in the review page reset to just "Subject" and "From", and in the second, the cache directories reset, so if you are using non-defaults for those, then mail will be put in the wrong place and the review page will present mail from there, instead of the correct place. If this does turn out to be another case of this bug, then maybe we should move 1.0.2 up a bit to next week, since it's starting to crop up fairly often, and putting the release out earlier would be less work than continually identifying this bug. > Even though the home page says: > Total emails trained: Spam: 1 Ham: 7 > Database only has 7 good and 1 spam - you should consider > performing additional training. > > If you click on the "more statistics" link, it says: > SpamBayes has processed 124 messages - 27 (22%) good, 54 > (44%) spam and 35 > (28%) unsure. > 30 messages were manually classified as good (1 was a false > positive). 32 messages were manually classified as spam (4 > were false negatives). 9 unsure messages were manually > identified as good, and 14 as spam. > > so apparently something is corrupted again? any ideas? Hmm. The "more statistics" page uses the 'message_info_database.db' database, rather than hammie.db, which holds the token counts themselves. It sounds like the same message_info db is always used, but different token databases. [Later] Thinking about this more, I think this is a case of the bug described above, because I'm guessing in your configuration file you have 'statistics_database.db' set as the token database (a hangover from a poor choice (mine) in a version a while back) and so when you change the configuration, you change the database that's being used. There are two workarounds for this (until 1.0.2 is out, which has the bug fix): 1. Change the configuration to use "hammie.db" as the "Storage file name". Since that's the default, the reset-to-default bug won't have any effect on this. 2. Always stop and restart SpamBayes after making any changes on the configuration page. =Tony.Meyer -- Please always include the list ([email protected]) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
