Try best traineddata: https://github.com/tesseract-ocr/tessdata_best
On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote: > > Environment > > Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe > Spanish Trained Data: > https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata > Command Used to OCR: > tesseract.exe ImageDoc.png output --oem 1 -l spa > Where ImageDoc.png is a Spanish Scanned Document > output is the text file output of OCRed text > > - Tesseract Version: 4.0 > - Platform: Windows version 64 Bit > > Current Behavior: > > In Spanish, character ‘o’ is recognized incorrectly as some round symbol. > Attached input file is ImageDoc.png and Error screenshot > > [image: spanish] > <https://user-images.githubusercontent.com/12831051/30733359-45541566-9f94-11e7-8bb1-e8027c2efc0e.png> > [image: imagedoc] > <https://user-images.githubusercontent.com/12831051/30733369-4d785ab8-9f94-11e7-9ff4-7f594f72a8dc.png> > > > > > Expected Behavior: > > Character ‘o’ should be recognized correctly. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0c091ffa-923c-4f48-b273-6d93751c8b82%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

