I just notice that under the Language-specific.sh, there are valid fonts for each language. I think i should use all the fonts for a single language.
Regards, Chen 在 2015年11月23日星期一 UTC+8上午2:36:51,Chen写道: > > I am trying to generate .traindata myself. I have some questions related > to the training procedure. > > We can find langdata and tessdata on github. Is there an official document > introducing how to convert langdata to the final .traindata? I'm not saying > the basic procedure here in wiki/TrainingTesseract, but the exact way to > reproduce the offical .traindata. I guess the release lang.traindatas are > generated by the Tesstrain.sh, but i cant find the script parameters like > used fonts for any language. For the important text2image function, there > are a lot of parameters, the official released can not just use one set of > parameters for all the languages, right? i'm not sure. Can anyone guide me > how to reproduce or nearly reproduce the offical .traindata? I think the > efforts on tuning parameters must have been made here in the training, i > just dont want to re-make the wheels again. BTW, the reason i want to > generate the traindata myself is that i just want to recognize a subset of > the whole language characters thus training a light package can greatly > reducing the recognition time. Thanks in advance. > > Regard, > Chen > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/75949e8e-12bf-4c62-8a81-c81467d8023e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

