The Wiki page offers more info: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-2017
On Sunday, September 24, 2017 at 9:56:29 AM UTC-5, Quan Nguyen wrote: > > It depends on your needs. There are also fast traineddata: > > https://github.com/tesseract-ocr/tessdata_fast > > It looks that many languages are represented. > > On Saturday, September 23, 2017 at 12:38:46 PM UTC-5, Subrato Namata wrote: >> >> Thanks Quan Nguyen. My initial results show that the issue is gone. Let >> me try with few more samples. >> Additionally, are these the best trained data of tesseract available for >> all the other languages and we must be using these only ? >> >> >> >> On Saturday, 23 September 2017 00:02:51 UTC+5:30, Quan Nguyen wrote: >>> >>> Try best traineddata: >>> >>> https://github.com/tesseract-ocr/tessdata_best >>> >>> On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote: >>>> >>>> Environment >>>> >>>> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe >>>> Spanish Trained Data: >>>> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata >>>> Command Used to OCR: >>>> tesseract.exe ImageDoc.png output --oem 1 -l spa >>>> Where ImageDoc.png is a Spanish Scanned Document >>>> output is the text file output of OCRed text >>>> >>>> - Tesseract Version: 4.0 >>>> - Platform: Windows version 64 Bit >>>> >>>> Current Behavior: >>>> >>>> In Spanish, character ‘o’ is recognized incorrectly as some round >>>> symbol. Attached input file is ImageDoc.png and Error screenshot >>>> >>>> [image: spanish] >>>> <https://user-images.githubusercontent.com/12831051/30733359-45541566-9f94-11e7-8bb1-e8027c2efc0e.png> >>>> [image: imagedoc] >>>> <https://user-images.githubusercontent.com/12831051/30733369-4d785ab8-9f94-11e7-9ff4-7f594f72a8dc.png> >>>> >>>> >>>> >>>> >>>> Expected Behavior: >>>> >>>> Character ‘o’ should be recognized correctly. >>>> >>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/43f20f10-35c3-49dd-9319-22267d0d857d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.