Loren Wilton wrote: >> 13 FUZZY_OCR BODY: Mail contains an image with common >> spam text inside >> Words found: >> "target" in 1 lines >> "service" in 1 >> lines >> "stock" in 2 lines >> "price" in 2 lines >> "company" in 1 lines >> "recommendation" in 1 >> lines >> (12 word occurrences found) >> >> I'm rather disturbed by the +13 score. Surely *no* single test should >> be able to add *nearly three times* my spam threshold of +5 to the score >> of a single mail? Is there a way to threshold the thing so that it will >> cap scores at +4.5 or something more sane? > > The FuzzyOCR score is a cumulation of the variosu subtests it hits. > There are a handful of configuration options that can set scores, > multipliers, and limits for various things.
That is correct, users should read the configuration file and understand what can be controlled. But look at the report, is says "12 word... found" but it only shows 8 words (counting repetitions), it looks like the score is wrong anyway. Sorry for taking the topic further away from the original problem with Util::wrap(), which is the cause of the lost formatting on the report, it just doesn't take into account the newlines and gets its character count wrong, so the end result is the mess you see after it adds line breaks as a function of that character count. -- René Berber