Dave> Good to know. I suppose there's no reason not to do this with a
Dave> cron job, if you're that confident in it.
Well, I do kill my fetchmail process so sb_bnfilter isn't trying to read the
database while tte is trying to write it.
Dave> OK... but what will happen if the real ratio of ham to spam is
Dave> more like 412:379 and I pass a simple ratio of 3:2?
All the RATIO tells it is how many spams and hams to score in one shot. 3:2
means (if I recall correctly) that it will pick the next three spams and
next two hams to score. It will then check their scores. Any which are
correctly scored won't be visited in the next round. I believe those that
are scored incorrectly will be used to update the training database at that
point.
Dave> I guess I'm saying that the ratio argument is good for training
Dave> some specific ratio of hams and spams... but does anyone really
Dave> want to train a specific ratio? What's the use case? If you've
Dave> supplied the ratio argument to make it easy for people to train
Dave> everything in an unbalanced set, it's not a very good way of
Dave> getting there.
Maybe, but it works for me.
Dave> Unfortunately, I want to keep my email address and my server, so
Dave> unless Google is going to make their spam blocking technology
Dave> public it means SB is going to have to take on the whole job.
I still use [EMAIL PROTECTED] as my visible identity. You could have
[EMAIL PROTECTED] forward to Gmail and then use POP3 to pick up your
mail from their server. Then use your normal email client and use your
boost-consulting address. You would lose the IMAP capability, but when
you're on the road you can still use the Gmail web interface to read your
mail. (You'll want to keep an eye on Gmail's spam classification accuracy
as well.)
Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html