The Wiki page offers more info:

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#updated-data-files-for-version-400-september-15-2017

On Sunday, September 24, 2017 at 9:56:29 AM UTC-5, Quan Nguyen wrote:
>
> It depends on your needs. There are also fast traineddata:
>
> https://github.com/tesseract-ocr/tessdata_fast
>
> It looks that many languages are represented.
>
> On Saturday, September 23, 2017 at 12:38:46 PM UTC-5, Subrato Namata wrote:
>>
>> Thanks Quan Nguyen. My initial results show that the issue is gone. Let 
>> me try with few more samples.
>> Additionally, are these the best trained data of tesseract available for 
>> all the other languages and we must be using these only ?
>>
>>
>>
>> On Saturday, 23 September 2017 00:02:51 UTC+5:30, Quan Nguyen wrote:
>>>
>>> Try best traineddata:
>>>
>>> https://github.com/tesseract-ocr/tessdata_best
>>>
>>> On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>>>>
>>>> Environment
>>>>
>>>> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
>>>> Spanish Trained Data: 
>>>> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
>>>> Command Used to OCR:
>>>> tesseract.exe ImageDoc.png output --oem 1 -l spa
>>>> Where ImageDoc.png is a Spanish Scanned Document
>>>> output is the text file output of OCRed text
>>>>
>>>>    - Tesseract Version: 4.0
>>>>    - Platform: Windows version 64 Bit
>>>>
>>>> Current Behavior:
>>>>
>>>> In Spanish, character ‘o’ is recognized incorrectly as some round 
>>>> symbol. Attached input file is ImageDoc.png and Error screenshot
>>>>
>>>> [image: spanish] 
>>>> <https://user-images.githubusercontent.com/12831051/30733359-45541566-9f94-11e7-8bb1-e8027c2efc0e.png>
>>>> [image: imagedoc] 
>>>> <https://user-images.githubusercontent.com/12831051/30733369-4d785ab8-9f94-11e7-9ff4-7f594f72a8dc.png>
>>>>
>>>>
>>>>
>>>>
>>>> Expected Behavior:
>>>>
>>>> Character ‘o’ should be recognized correctly.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/43f20f10-35c3-49dd-9319-22267d0d857d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to