On Wed, May 24, 2006 12:48, [EMAIL PROTECTED] said: > > Amedee> However, English is not my mother language and most of my > Amedee> correspondence is in Dutch. As a consequence, most common > Amedee> English words are quite uncommon for me. The result is that > Amedee> common English words will score a bit above 0.5. Perhaps not > Amedee> much, but enough to be significant after a while. > > Thanks, I didn't realize that. Do you have an example in your training > database you can share with us (both message and word scores) where you > think the English disclaimer text has tipped the scales and caused a ham > message to later be scored as spam? If you simple train on one or two of > those misclassified hams does the problem go away? How skewed is your > training database (number of spams vs number of hams)? Have you > considered > throwing out your current training database and starting fresh? > > One thing that might help is to further break messages which score as spam > into "low" and "high" spam. Based on my current settings that gives me > these four categories: > > ham 0.00-0.14 > unsure 0.15-0.59 > low spam 0.60-0.74 > high spam 0.75-1.00 > > High spam is tossed without further consideration. Ham is sorted in the > appropriate mailbox by procmail. Unsure and low spam messages each wind > up > in their own mailboxes for further consideration. I train on most unsure > messages but only train on lospams which are actually ham. > > My suspicion is that if you have ham messages which are erroneously > winding > up as spam they are at the very low end of the spam scale. It might be > sufficient to move your spam threshold up a bit so they are more likely to > land in the unsure category. > > Skip
Skip, I think you have hit the mark there. I already use something like your lospam/hiham. I have 5 categories: high ham, low ham, unsure, low spam, high spam The high ham/spam respectively go to procmail or /dev/null. And indeed, the misclassified hams all wind up in unsure or low spam. -- Amedee _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
