Jamie> since the decoy text is completely non-commercial in nature, it
    Jamie> seems to be polluting my index and making detection less
    Jamie> accurate.  With OCR, will this continue to be an issue?

Sure, if the decoy text actually turns out to be relevant from a scoring
standpoint.  By default the SpamBayes classifier only considers tokens
(words) which score <= 0.4 or >= 0.6.  My guess is that most of the words in
the decoy text are clustered around 0.5 so aren't even considered.

Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to