> i had been using version 0.3 of spambayes for a long time > (XP/outlook express) and it was working fairly well. i > recently upgraded to 1.0.1,
Wow - that's quite a jump! > and now i get a ton of false positives (including the > confirmation and welcome messages from this mailing list !!) > probably close to 20% of my valid emails are being marked > as spam. With such a large jump, the easiest solution, particularly to take advantage of the various improvements in SpamBayes over that time, would be to retrain from scratch. Mistake-based training (c.f. <http://entrian.com/sbwiki/TrainingIdeas>) should result in high accuracy (certainly higher than you're getting right now) with only a few dozen messages trained. > here is the info from the Message Clues page (the "Clues" link) > from one of the false positives. it is marked as 99.7% spam > probability! and it appears that it is counting the "spam," > string in the 'subject' and 'to' fields as part of the reason > to consider it spam (?), even though those were added by Spambayes. The latter is a known bug that will be fixed in 1.1 (it's fixed in CVS). They are, unfortunately, very strong clues in this example message. I suspect that maybe one of the reasons for the sudden change is that you might have been using the experimental ham/spam imbalance option that SpamBayes used to include, which is completely gone these days. Suddenly not using that could have quite an impact. > although 0.754605 7 8 This is a concern - you have trained 7 ham messages and 8 spam messages with "although" in them, and the score is definitely spam. The most probable cause for this is the training imbalance (1299::3644 or ~1:2.8), although that doesn't really seem all that bad (maybe those counts are out? Given the number of database problems that have been fixed since 0.3 there's a moderate chance that the database is in shoddy state). Generally a roughly balanced database is better. =Tony.Meyer -- Please always include the list ([email protected]) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
