[tesseract-ocr] Re: Difference trained data for Chinese

Yang Yu Sun, 13 Aug 2017 18:12:04 -0700

Thanks! That is pretty clear.


On Friday, August 11, 2017 at 8:42:00 PM UTC+8, shree wrote:
>
> Please see https://github.com/tesseract-ocr/tessdata/issues/72 
>
>
>
> On Friday, August 11, 2017 at 2:26:55 PM UTC+5:30, Yang Yu wrote:
>>
>> Good day!
>>
>> Recently I was using tesseract (4.0 alpha) to do Chinese OCR and it works 
>> really great. Now I want to pick up a best model to use but I find several 
>> versions. What is the difference between them?
>>
>> 1. chi_sim from 
>> https://github.com/tesseract-ocr/tesseract/wiki/Data-Files (around 50M)
>> 2. chi_sim from 
>> https://github.com/tesseract-ocr/tessdata/tree/master/best (around 13M)
>> 3. chi_sim_vert from 
>> https://github.com/tesseract-ocr/tessdata/tree/master/best (around 13M)
>> 4. HanS from https://github.com/tesseract-ocr/tessdata/tree/master/best 
>> (around 16M)
>>
>> All of them can work but the results are slightly different. From my own 
>> evaluation #4 is the best, but I don't have any insight.
>>
>> Appreciate for any help.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d7a82214-a725-465c-b61a-1f1551ecb65f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Difference trained data for Chinese

Reply via email to