Hello guys.
I want to add new language script to Tesseract OCR and researching to training data. Then I want to know below things. 1. Is there any automatic tool that make a langdata training_text and wordlist files from massive text? 2. Is there any documentation about preparing text data and explanation about text data files? I just saw directory langdata/jpn/ and there are some files. But I have know idea about this files and how to create files like those? What rule should I use create langdata files? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/326441a6-b4ad-4b8f-a49f-468c87841617%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

