[tesseract-ocr] Difference trained data for Chinese

Yang Yu Fri, 11 Aug 2017 01:57:10 -0700

Good day!

Recently I was using tesseract (4.0 alpha) to do Chinese OCR and it works 
really great. Now I want to pick up a best model to use but I find several 
versions. What is the difference between them?


1. chi_sim from https://github.com/tesseract-ocr/tesseract/wiki/Data-Files 
(around 50M)
2. chi_sim from https://github.com/tesseract-ocr/tessdata/tree/master/best 
(around 13M)
3. chi_sim_vert 
from https://github.com/tesseract-ocr/tessdata/tree/master/best (around 13M)
4. HanS from https://github.com/tesseract-ocr/tessdata/tree/master/best 
(around 16M)

All of them can work but the results are slightly different. From my own 
evaluation #4 is the best, but I don't have any insight.

Appreciate for any help.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8cc88ed2-99c3-445e-b758-83ade0f680aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Difference trained data for Chinese

Reply via email to