decoder wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

decoder wrote:
Hello there,

I have improved the original OcrPlugin (found at
http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
fuzzy matching. Like that, mistakes made by the OCR recognition or
 intentional obfuscations in the text don't make the recognition
impossible. This is being done with a relative distance calculation
 between the pattern (word from a given word list) and a line in
the recognized input. Also, the plugin uses dynamic scoring (more
matched words means more score, this can be adjusted in the
source).

You can find a full description and an example in the wiki under:

http://wiki.apache.org/spamassassin/FuzzyOcrPlugin


Ideas for improvements or critics are always welcome :)


Best regards,


Chris

Hello again,


I just released a new version which contains all suggestions made here
on the mailing list. Changelog:

* Added scoring for wrong content-type
* Added scoring for broken gif images
* Added configuration for helper applications
* Added autodisable_score feature to disable the OCR engine if the
message has already enough points


You can now obtain the plugin as a tarball, the download URL is at the
end of the wiki page. (http://wiki.apache.org/spamassassin/FuzzyOcrPlugin)

All new options in the config file, especially score adjustments for
the new features, are explained there as well and in the sample cf file.
Hi
I get the following warnings when linting:
[29661] warn: config: warning: description exists for non-existent rule FUZZY_OCR_CORRUPT_IMG [29661] warn: config: warning: description exists for non-existent rule FUZZY_OCR_WRONG_CTYPE [29661] warn: lint: 2 issues detected, please rerun with debug enabled for more information

Reply via email to