On Sun, July 29, 2007 20:25, David Abrahams wrote: > > on Fri Jul 27 2007, skip-AT-pobox.com wrote: > >> Brendon> I've just started using spambayes again after a while away >> from >> Brendon> it. Now, 3 days in, I notice that I've trained on far more >> Brendon> spam than ham. (Total emails trained: Spam: *432* Ham: >> *64) I >> Brendon> seem to remember that this was previously my experience in >> the >> Brendon> past. >> >> Are you training on every message you receive or just the mistakes? >> Most >> people generally only train on the mistakes and unsures. Your ratio is >> about 7:1. That's a bit high. > > Even training only on mistakes and unsures, I have had a steadily > increasing ratio for months. I almost never see a misclassified ham > and only very rarely a ham about which the system is unsure. It's > unsure about spam every day.
I have the same experience: [EMAIL PROTECTED] { ~ }$ ./spamstats Spam: 2415 Ham: 651 That's 3.7:1, and it's increasing. Nonetheless I have never seen a false positive. I only train on mistakes and unsures. Most of my email is to/from the same 50 people or so, and most of the time they write messages longer than 50 words, and almost all of them in Dutch. The very few times I saw a spam classified as ham, it had Dutch nonsense words in it. I would agree that in theory having equal amounts of ham and spam would be better, however in my particular case there are significant factors that mitigate the need of a 1:1 ratio. I'm also claiming that my particular situation cannot be used to draw general conclusions, and that Your Mileage May Vary(tm). -- Amedee _______________________________________________ SpamBayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html