decoder wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
decoder wrote:
Hello there,
I have improved the original OcrPlugin (found at
http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
fuzzy matching. Like that, mistakes made by the OCR recognition or
intentional obfuscations in the text don't make the recognition
impossible. This is being done with a relative distance calculation
between the pattern (word from a given word list) and a line in
the recognized input. Also, the plugin uses dynamic scoring (more
matched words means more score, this can be adjusted in the
source).
You can find a full description and an example in the wiki under:
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
Ideas for improvements or critics are always welcome :)
Best regards,
Chris
Hello again,
I just released a new version which contains all suggestions made here
on the mailing list. Changelog:
* Added scoring for wrong content-type
* Added scoring for broken gif images
* Added configuration for helper applications
* Added autodisable_score feature to disable the OCR engine if the
message has already enough points
You can now obtain the plugin as a tarball, the download URL is at the
end of the wiki page. (http://wiki.apache.org/spamassassin/FuzzyOcrPlugin)
All new options in the config file, especially score adjustments for
the new features, are explained there as well and in the sample cf file.
Hi
I get the following warnings when linting:
[29661] warn: config: warning: description exists for non-existent rule
FUZZY_OCR_CORRUPT_IMG
[29661] warn: config: warning: description exists for non-existent rule
FUZZY_OCR_WRONG_CTYPE
[29661] warn: lint: 2 issues detected, please rerun with debug enabled
for more information