Good day! Recently I was using tesseract (4.0 alpha) to do Chinese OCR and it works really great. Now I want to pick up a best model to use but I find several versions. What is the difference between them?
1. chi_sim from https://github.com/tesseract-ocr/tesseract/wiki/Data-Files (around 50M) 2. chi_sim from https://github.com/tesseract-ocr/tessdata/tree/master/best (around 13M) 3. chi_sim_vert from https://github.com/tesseract-ocr/tessdata/tree/master/best (around 13M) 4. HanS from https://github.com/tesseract-ocr/tessdata/tree/master/best (around 16M) All of them can work but the results are slightly different. From my own evaluation #4 is the best, but I don't have any insight. Appreciate for any help. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8cc88ed2-99c3-445e-b758-83ade0f680aa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.