Matus UHLAR - fantomas wrote:
That is a question that was very frequently asked around here and that's why I also included it in the FuzzyOcr FAQ:does it push the extracted text back to SA so it could be used by e.g. bayes? This is how it imho should be used.(and imho the same for .pdf and/or .doc - extract text _and_ images from it, call OCR for images...)
"If you take a look at the actual results of the OCR engines used, then you'll see that the output suffers from a lot of noise. Hence, it is not suited for common word analysis like bayes, and FuzzyOcr uses a special fuzzy matching algorithm to find the words"
Also, the SA plugin architecture is not designed to modify the message in any way, so you cannot push back the text into the normal processing line.
As to image spam in general: Yes, it has dropped dramatically and I haven't seen any actually for quite a long time now. I hope that my tool is one reason that this annoying technique is gone now :D
Best regards, Chris
smime.p7s
Description: S/MIME Cryptographic Signature