Good day!

Recently I was using tesseract (4.0 alpha) to do Chinese OCR and it works 
really great. Now I want to pick up a best model to use but I find several 
versions. What is the difference between them?

1. chi_sim from https://github.com/tesseract-ocr/tesseract/wiki/Data-Files 
(around 50M)
2. chi_sim from https://github.com/tesseract-ocr/tessdata/tree/master/best 
(around 13M)
3. chi_sim_vert 
from https://github.com/tesseract-ocr/tessdata/tree/master/best (around 13M)
4. HanS from https://github.com/tesseract-ocr/tessdata/tree/master/best 
(around 16M)

All of them can work but the results are slightly different. From my own 
evaluation #4 is the best, but I don't have any insight.

Appreciate for any help.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8cc88ed2-99c3-445e-b758-83ade0f680aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to