On Tue, May 23, 2006 19:04, [EMAIL PROTECTED] said: > > Amedee> I have noticed that a lot of spam contains disclaimer-ish > text. > Amedee> If I train spambayes with "disclaimed" ham, I fear this will > Amedee> "pollute" the sb database. The result might be that any email > Amedee> with a disclaimer-ish text will get a relatively high ham > score. > Amedee> At the moment, I don't see a solution for this possible > problem. > Amedee> I *could* not train on disclaimed ham, but if most of my > Amedee> correspondents have such boilerplates, training spambayes > won't > Amedee> be very efficient. > > That depends. Most common English words (most of the words in disclaimers > are probably pretty common) should probably score around 0.5 and thus not > be > used in ranking messages, e.g.:
Interesting. However, English is not my mother language and most of my correspondence is in Dutch. As a consequence, most common English words are quite uncommon for me. The result is that common English words will score a bit above 0.5. Perhaps not much, but enough to be significant after a while. -- Disclaimer: By sending an email to ANY of my addresses you are agreeing that: 1. I am by definition, "the intended recipient" 2. All information in the email is mine to do with as I see fit and make such financial profit, political mileage, or good joke as it lends itself to. In particular, I may quote it on usenet. 3. I may take the contents as representing the views of your company. 4. This overrides any disclaimer or statement of confidentiality that may be included on your message. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
