Thanks Quan Nguyen. My initial results show that the issue is gone. Let me 
try with few more samples.
Additionally, are these the best trained data of tesseract available for 
all the other languages and we must be using these only ?



On Saturday, 23 September 2017 00:02:51 UTC+5:30, Quan Nguyen wrote:
>
> Try best traineddata:
>
> https://github.com/tesseract-ocr/tessdata_best
>
> On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>>
>> Environment
>>
>> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
>> Spanish Trained Data: 
>> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
>> Command Used to OCR:
>> tesseract.exe ImageDoc.png output --oem 1 -l spa
>> Where ImageDoc.png is a Spanish Scanned Document
>> output is the text file output of OCRed text
>>
>>    - Tesseract Version: 4.0
>>    - Platform: Windows version 64 Bit
>>
>> Current Behavior:
>>
>> In Spanish, character ‘o’ is recognized incorrectly as some round symbol. 
>> Attached input file is ImageDoc.png and Error screenshot
>>
>> [image: spanish] 
>> <https://user-images.githubusercontent.com/12831051/30733359-45541566-9f94-11e7-8bb1-e8027c2efc0e.png>
>> [image: imagedoc] 
>> <https://user-images.githubusercontent.com/12831051/30733369-4d785ab8-9f94-11e7-9ff4-7f594f72a8dc.png>
>>
>>
>>
>>
>> Expected Behavior:
>>
>> Character ‘o’ should be recognized correctly.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e64de3b7-7a04-49a4-ae6c-d4f3e33cf65f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to