Dear all, I'm a freelance software developer from Vietnam. Currently I am working on improving the training data of Tesseract OCR for Vietnamese language. I am having some troubles with training new data for Vietnamese languages as below:
1. Could someone share with me the process as well as the tools that Google used to make .tif/.box files? And the guidelines of how to use the tools if possible. 2. Did Google add Vietnamese fonts to the current training data for Vietnamese? If yes, could someone let me know how to check which fonts were used? 3. Could someone share with me some .tif/.box files that Google made and included in the current training data for Vietnamese ? I would like to know what the standards for those .tif/.box files are (font size, image resolution, etc.) Thank you very much for spending your time to answers my questions. Best regards. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/17fd7bce-0b24-4793-972c-a149229a899b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

