On Sun, July 29, 2007 20:25, David Abrahams wrote:
>
> on Fri Jul 27 2007, skip-AT-pobox.com wrote:
>
>>     Brendon> I've just started using spambayes again after a while away
>> from
>>     Brendon> it.  Now, 3 days in, I notice that I've trained on far more
>>     Brendon> spam than ham.  (Total emails trained: Spam: *432* Ham:
>> *64) I
>>     Brendon> seem to remember that this was previously my experience in
>> the
>>     Brendon> past.
>>
>> Are you training on every message you receive or just the mistakes?
>> Most
>> people generally only train on the mistakes and unsures.  Your ratio is
>> about 7:1.  That's a bit high.
>
> Even training only on mistakes and unsures, I have had a steadily
> increasing ratio for months.  I almost never see a misclassified ham
> and only very rarely a ham about which the system is unsure.  It's
> unsure about spam every day.

I have the same experience:

[EMAIL PROTECTED] { ~ }$ ./spamstats
 Spam: 2415 Ham: 651

That's 3.7:1, and it's increasing. Nonetheless I have never seen a false
positive. I only train on mistakes and unsures.
Most of my email is to/from the same 50 people or so, and most of the time
they write messages longer than 50 words, and almost all of them in Dutch.
The very few times I saw a spam classified as ham, it had Dutch nonsense
words in it.

I would agree that in theory having equal amounts of ham and spam would be
better, however in my particular case there are significant factors that
mitigate the need of a 1:1 ratio. I'm also claiming that my particular
situation cannot be used to draw general conclusions, and that Your
Mileage May Vary(tm).

-- 
Amedee



_______________________________________________
SpamBayes@python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to