Hello guys.


I want to add new language script to Tesseract OCR and researching to 
training data.


Then I want to know below things.

   1. Is there any automatic tool that make a langdata training_text and 
   wordlist files from massive text?
   2. Is there any documentation about preparing text data and explanation 
   about text data files? I just saw directory langdata/jpn/ and there are 
   some files. But I have know idea about this files and how to create files 
   like those? What rule should I use create langdata files?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/326441a6-b4ad-4b8f-a49f-468c87841617%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to