decoder wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

decoder wrote:
Hello there,

I have improved the original OcrPlugin (found at
http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
fuzzy matching. Like that, mistakes made by the OCR recognition or
intentional obfuscations in the text don't make the recognition
impossible. This is being done with a relative distance calculation
 between the pattern (word from a given word list) and a line in
the recognized input. Also, the plugin uses dynamic scoring (more
matched words means more score, this can be adjusted in the
source).

You can find a full description and an example in the wiki under:

http://wiki.apache.org/spamassassin/FuzzyOcrPlugin


Ideas for improvements or critics are always welcome :)


Best regards,


Chris

A new beta is available (2.2-beta1).

It includes a bugfix for a bug with jpeg content-types reported by
Matthias Keller. Other changes:

- - Debug file stuff removed, instead of that, the tempfiles don't get
deleted when in debug mode (verbose > 1).
- - Logfile support, all debug messages go there
- - Much more debug messages
- - Error handling/logging (Thanks to Ron Bender for pointing that out)
- - Added the necessary priority line to the cf file. (Thanks to Mark
Martinec and others for reminding me about that)

Please note that this is a beta... so you should probably try it out
in non-production environments first before blaming me ;D
Hi Chris

Wanted to report back - it's all working nicely and smoothly so far
And thanks to your plugin yesterday an onslaught of about 30 image spams within one minute have been blocked efficiently. Especially with my much extended wordlist most of them get blocked accurately - my only concern is the varying results from gocr nobody has been able to help me with I've tried 3 different gocr 0.40 versions and none seems to be as good as yours... you dont have the source to your version somewhere so I could try yours??

I've got one request tough: When announcing a new version, could you post a new subject instead of replying to the old one, maybe with a subject "FuzzyOcr v2.2-beta1 released" ? In my thread sorted view I always have to go look for a message in an old thread...

btw, i've subscribed to your list tough i feel general discussion about your plugin should be done here whereas support inquiries and that stuff can nicely fit on the separate one...

Matt

Reply via email to