On Tue, 16 Oct 2018 15:48:34 +0200
Matus UHLAR - fantomas wrote:

> >On Tue, 16 Oct 2018 11:49:54 +0700 Olivier wrote:  
> >> One of my holdback with FuzzyOCR is that you have to provide an
> >> independant word list, while we have a very good tool to analyze
> >> text contents: SpamAssassin itself. So I would much prefer
> >> FuzzyOCR to feed the OCR'ed text back to SA for further analysis
> >> (the way pdfAssassin is working).  
> 
> On 16.10.18 13:34, RW wrote:
> >That works as long as the OCR remains very accurate. What happened
> >before was that the deployment of OCR lead spammers to make their
> >text much less readable.  
> 
> I think that original reason was that available OCR programs were not
> reliable enough.
> 
> I have tested gocr, ocrad and tesseract some >10 years ago, with not
> very satisfying results, gocr being best at that time.
> 
> Since then, google took tesseract and made it much better.
> 
> I believe tht currently it would bve viable to push ocr output to
> spamassassin for processing with bayes and other rules.


Bayes might work, but I wouldn't like to see it added to body text
because corrupted text could look like obfuscation.

Reply via email to