>> Retraining from scratch is probably the best option. 
>
> So how do you that? I have about a 400:1 Ham to Spam ratio at 
> this momment.

You can either delete the existing databases (default_bayes_database.db,
default_message_database.db) manually (they are in the 'data directory';
SpamBayes->SpamBayes Manager->Advanced->Show Data Directory will open it),
or use the "Training" tab (or the wizard, probably).

If you use the "Training" tab, just be sure to tick the "Rebuild entire
database" box.

The training method that we recommend is to start with little or no
training, and train only on misclassified and unsure messages using the
"Delete as Spam" and "Recover from Spam" buttons.  This generally gives the
best results.  One other thing that can help avoid imbalance is adjusting
the ham/spam thresholds ("Filtering" tab) once you have the classifier
reasonably trained.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. 

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to