[tesseract-ocr] Tesseract training details

Chen Sun, 22 Nov 2015 10:37:05 -0800

I am trying to generate .traindata myself. I have some questions related to 
the training procedure.

We can find langdata and tessdata on github. Is there an official document
introducing how to convert langdata to the final .traindata? I'm not saying
the basic procedure here in wiki/TrainingTesseract, but the exact way to
reproduce the offical .traindata. I guess the release lang.traindatas are
generated by the Tesstrain.sh, but i cant find the script parameters like
used fonts for any language. For the important text2image function, there
are a lot of parameters, the official released can not just use one set of
parameters for all the languages, right? i'm not sure. Can anyone guide me
how to reproduce or nearly reproduce the offical .traindata? I think the
efforts on tuning parameters must have been made here in the training, i
just dont want to re-make the wheels again. BTW, the reason i want to
generate the traindata myself is that i just want to recognize a subset of
the whole language characters thus training a light package can greatly
reducing the recognition time. Thanks in advance.

Regard,
Chen

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/f7d75d73-a146-49c8-9636-8daac31b6f6b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Tesseract training details

Reply via email to