On Tue, 16 Oct 2018 15:48:34 +0200 Matus UHLAR - fantomas wrote: > >On Tue, 16 Oct 2018 11:49:54 +0700 Olivier wrote: > >> One of my holdback with FuzzyOCR is that you have to provide an > >> independant word list, while we have a very good tool to analyze > >> text contents: SpamAssassin itself. So I would much prefer > >> FuzzyOCR to feed the OCR'ed text back to SA for further analysis > >> (the way pdfAssassin is working). > > On 16.10.18 13:34, RW wrote: > >That works as long as the OCR remains very accurate. What happened > >before was that the deployment of OCR lead spammers to make their > >text much less readable. > > I think that original reason was that available OCR programs were not > reliable enough. > > I have tested gocr, ocrad and tesseract some >10 years ago, with not > very satisfying results, gocr being best at that time. > > Since then, google took tesseract and made it much better. > > I believe tht currently it would bve viable to push ocr output to > spamassassin for processing with bayes and other rules.
Bayes might work, but I wouldn't like to see it added to body text because corrupted text could look like obfuscation.