Mark Hammond wrote:
I don't get a lot of ham, and currently have 55 ham and 580 spam in my Spambayes database. Despite this, it seems to be working admirably. It is however very sensitive to just one spam mistakenly put into the ham base, which then completely upsets the filtering.That is high relative to the conventional wisdom, but I'm questioning the correctness of that wisdom.Check out this thread, which should give you a reasonable idea:http://mail.python.org/pipermail/spambayes-dev/2003-November/001578.htmlPerhaps its time to re-evaluate that statement?Google also shows anecdotal reports of poor results after an imbalance as low as 2:1, so I don't think it would be responsible to re-evaluate that statement until clear evidence was presented to the contrary. So if the perceived wisdom is that I need to balance the ratios, what should I do? send myself ham? or not use spam from my unsure folder for training? or get more friends??? regards, Mike |
_______________________________________________ SpamBayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html