I am trying to generate .traindata myself. I have some questions related to 
the training procedure.

We can find langdata and tessdata on github. Is there an official document 
introducing how to convert langdata to the final .traindata? I'm not saying 
the basic procedure here in wiki/TrainingTesseract, but the exact way to 
reproduce the offical .traindata. I guess the release lang.traindatas are 
generated by the Tesstrain.sh, but i cant find the script parameters like 
used fonts for any language. For the important text2image function, there 
are a lot of parameters, the official released can not just use one set of 
parameters for all the languages, right? i'm not sure. Can anyone guide me 
how to reproduce or nearly reproduce the offical .traindata? I think the 
efforts on tuning parameters must have been made here in the training, i 
just dont want to re-make the wheels again. BTW, the reason i want to 
generate the traindata myself is that i just want to recognize a subset of 
the whole language characters thus training a light package can greatly 
reducing the recognition time. Thanks in advance.

Regard,
Chen 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f7d75d73-a146-49c8-9636-8daac31b6f6b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to