On 28 May 2007, Loren Wilton told this: >> 13 FUZZY_OCR BODY: Mail contains an image with common spam >> text inside >> Words found: >> "target" in 1 lines >> "service" in 1 >> lines >> "stock" in 2 lines >> "price" in 2 lines >> "company" in 1 lines >> "recommendation" in 1 >> lines >> (12 word occurrences found) >> >> I'm rather disturbed by the +13 score. Surely *no* single test should >> be able to add *nearly three times* my spam threshold of +5 to the score >> of a single mail? Is there a way to threshold the thing so that it will >> cap scores at +4.5 or something more sane? > > The FuzzyOCR score is a cumulation of the variosu subtests it hits. > There are a handful of configuration options that can set scores, > multipliers, and limits for various things.
Yeah, but there isn't an upper bound :/ > While FuzzyOCR creates what seem to be amazingly high scores, few > people report much in the way of FP problems. I'm being ridiculously picky here, because I get almost no legitimate email containing images of any kind, let alone images with spammy words in them: I just don't like things, however smart, that reduce SpamAssassin to a one-shot-and-you're-dead system. (And, let's be blunt, the pure this-word-is-spammy recognition part of FuzzyOCR is much less smart than the Bayesian system already present in SA: FuzzyOCR should really use the Bayesian system to determine the spamminess of words, I suppose...) -- `On a scale of one to ten of usefulness, BBC BASIC was several points ahead of the competition, scoring a relatively respectable zero.' --- Peter Corlett