Careful. If I read the user's initial message correctly, what she calls
a false negative most of us would call a false positive, i.e. a ham
message identified as spam or potential spam. As Kenny points out, a few
false negatives are a common annoyance. But false positives can be a
more serious problem, since their presence forces you to slog through
rivers of spam looking for good messages you might otherwise miss.

Bob


> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Kenny Pitt
> Sent: Tuesday, October 11, 2005 10:08 AM
> To: mgleich
> Cc: [email protected]
> Subject: Re: [Spambayes] rebuild database?
> 
> 
> On 10/10/05, mgleich <[EMAIL PROTECTED]> wrote:
> > I've just realized that although my database is 536kb and that is
not 
> > so large, it is composed of 702 spam and 110 ham.  I gather this is 
> > extremely unbalanced and may explain why I'm getting false
negatives.
> 
> Actually, 7 to 1 is really not an unusually high imbalance. 
> We've seen reports from people who have 100 to 1 or higher imbalances.
> 
> If you are getting false positives then imbalance is the most 
> common cause. A few false negatives are not uncommon, though, 
> because spam is constantly changing. If a relatively high 
> percentage of your spam is coming in as false negatives, then 
> you might have an imbalance problem. The best way to tell for 
> sure is to see the spam clues for one of the false negatives, 
> which you can generate from the SpamBayes menu.
> 
> > Do I need to begin from scratch?  If so, do I just delete the db
file 
> > and will Spambayes just create a new one?
> 
> For a 7 to 1 imbalance, I would usually say there is no need 
> to begin from scratch. However, SpamBayes learns quickly so 
> it shouldn't hurt to start over and see what happens. Since 
> you know the size of your DB, you've obviously located the 
> file. You will probably see two files with the *.db 
> extension, one is the training data and the other contains 
> information about the messages that have been processed. Just 
> close Outlook, delete these 2 files, then restart Outlook and 
> SpamBayes should recreate the databases.
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to