On 28 May 2007, Loren Wilton told this:

>>  13 FUZZY_OCR              BODY: Mail contains an image with common spam 
>> text inside
>>                            Words found:
>> "target" in 1 lines
>> "service" in 1
>>                             lines
>> "stock" in 2 lines
>> "price" in 2 lines
>>                            "company" in 1 lines
>> "recommendation" in 1
>>                            lines
>> (12 word occurrences found)
>>
>> I'm rather disturbed by the +13 score. Surely *no* single test should
>> be able to add *nearly three times* my spam threshold of +5 to the score
>> of a single mail? Is there a way to threshold the thing so that it will
>> cap scores at +4.5 or something more sane?
>
> The FuzzyOCR score is a cumulation of the variosu subtests it hits.
> There are a handful of configuration options that can set scores,
> multipliers, and limits for various things.

Yeah, but there isn't an upper bound :/

> While FuzzyOCR creates what seem to be amazingly high scores, few
> people report much in the way of FP problems.

I'm being ridiculously picky here, because I get almost no legitimate
email containing images of any kind, let alone images with spammy words
in them: I just don't like things, however smart, that reduce
SpamAssassin to a one-shot-and-you're-dead system. (And, let's be blunt,
the pure this-word-is-spammy recognition part of FuzzyOCR is much less
smart than the Bayesian system already present in SA: FuzzyOCR should
really use the Bayesian system to determine the spamminess of words, I
suppose...)

-- 
`On a scale of one to ten of usefulness, BBC BASIC was several points ahead
 of the competition, scoring a relatively respectable zero.' --- Peter Corlett

Reply via email to