Yes, there is a method for rendering synthetic training data from training_text and fonts via text2image program and tesstrain.sh script.
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-tesstrain.sh Which version of tesseract are you using? I would suggest that you try the latest version built from github with the Chinese traineddata and then do the training. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jun 16, 2017 at 3:06 PM, Richard Foo <[email protected]> wrote: > Dear all, > > I am new to tesseract. When I train a large char set language like > Chinese, I have no idea which step I should use the char set(over 7000 > char) I prepared. Currently, I consider it as a training set by converting > all_char.txt to tiff files. Therefore, I have a image training data of a > single font which can be used for making box files. > > p.s: is there any methods(softwares) for rendering synthetic training data > from text except scanning or printing? > > thanks, > > Richard > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/5aee4e28-4bb4-460a-8d27-c9ff3a8a3bd0% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/5aee4e28-4bb4-460a-8d27-c9ff3a8a3bd0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVEYOmfcTOu1scTg4rviKG1WKkGkhzq0q083hxYB1hQZw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

