I am trying to generate .traindata myself. I have some questions related to the training procedure.
We can find langdata and tessdata on github. Is there an official document introducing how to convert langdata to the final .traindata? I'm not saying the basic procedure here in wiki/TrainingTesseract, but the exact way to reproduce the offical .traindata. I guess the release lang.traindatas are generated by the Tesstrain.sh, but i cant find the script parameters like used fonts for any language. For the important text2image function, there are a lot of parameters, the official released can not just use one set of parameters for all the languages, right? i'm not sure. Can anyone guide me how to reproduce or nearly reproduce the offical .traindata? I think the efforts on tuning parameters must have been made here in the training, i just dont want to re-make the wheels again. BTW, the reason i want to generate the traindata myself is that i just want to recognize a subset of the whole language characters thus training a light package can greatly reducing the recognition time. Thanks in advance. Regard, Chen -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f7d75d73-a146-49c8-9636-8daac31b6f6b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

