thank you for your reply ,shree. I've seen the training_text and the list of fonts. I will try again. Before I start my next Scratch training,I want to ask some questions as follows.
1.Is the training_text containing more characters, the better the training results? Is there an upper limit? 2.Whether the more fonts are used, the better the training results will be? 3.I find that the official text contains not only Chinese characters, but also English characters and numbers. If I will use the command like this: tesseract.exe test.png c:\dir\test -l eng+chi_sim Is it better for me to train a training_text with pure Chinese characters? 在 2018年10月30日星期二 UTC+8上午2:43:05,shree写道: > > https://github.com/tesseract-ocr/langdata_lstm/tree/master/chi_sim > > On Mon, 29 Oct 2018, 14:41 Shree Devi Kumar, <[email protected] > <javascript:>> wrote: > >> Please look at the langdata_lstm repo, specifically the chi_sim folder. >> It has the training_text as well as list of fonts used for LSTM training. >> >> On Mon, 29 Oct 2018, 05:40 bruce, <[email protected] <javascript:>> wrote: >> >>> Recently,I'm using tesseract training my chi_sim language. I want to >>> train a chi_sim.traineddata better than the official one. >>> I have generated a 82915-characters training data.And trained it with 7 >>> common fonts。 >>> After 4434207 iterations ,the train rate is lower than 0.016% ,But the >>> recognition effect is much worse than the official training library. >>> >>> so,I'm confused... >>> >>> How to improve the quality of Training? >>> Do I need more training data for more training fonts?What is the right >>> amount? >>> I want to know the training data of the official training library and >>> the font range of the official training library. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To post to this group, send email to [email protected] >>> <javascript:>. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/a7acc320-67f6-42b3-b2c8-99d3db6de7e6%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/a7acc320-67f6-42b3-b2c8-99d3db6de7e6%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/501bdf42-ee5a-4a2e-92ce-8dbac2cc42be%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

