OK. Thanks for your reply. 在 2017年9月19日星期二 UTC+8下午5:06:57,shree写道: > > Ray is the only one who would know those details. > > Please see > https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-322020794 > for his comment regarding finetuning. > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Sep 19, 2017 at 2:28 PM, <[email protected] <javascript:>> > wrote: > >> Does the finetune update all the parameters in all of the layers? >> >> We need to add lots of mathematical symbols and some other special >> symbols. Maybe we should scratch training? >> >> What is the char error and iteration times for the scratch training, then >> we train the chi_sim(Simplified Chinese)? >> >> >> >> 在 2017年9月19日星期二 UTC+8下午4:49:30,shree写道: >>> >>> As per comments by Ray, for finetune or for plus minus a few letters. >>> the number of iterations should be limited to 3000 or so. >>> >>> It probably won't get to .2% accuracy, but you might have better results >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Tue, Sep 19, 2017 at 2:00 PM, <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> I am training my own traineddata model for the chi_sim language with >>>> the finetune training. In my trained data, there are some mathematical >>>> symbols, such as "∞", "β", "△" and so on, which cannot be recognized in >>>> the >>>> official chi_sim.traineddata model. >>>> >>>> So we change the content of the chi_sim.training_text file, and fill >>>> the file with our training data. >>>> >>>> >>>> Then executing the training command: >>>> training/lstmtraining --model_output >>>> ~/tesstutorial/trainspecial/special \ >>>> --continue_from ~/tesstutorial/trainspecial/chi_sim.lstm \ >>>> --traineddata ~/tesstutorial/trainspecial/chi_sim/chi_sim.traineddata >>>> \ >>>> --old_traineddata tessdata/best/chi_sim.traineddata \ >>>> --train_listfile >>>> ~/tesstutorial/trainspecial/chi_sim.training_files.txt \ >>>> --max_iterations 400000 >>>> >>>> As the command, when we iterate 400000 times, the char error is about >>>> 0.2% and the word error is about 4.2%. >>>> The error rate has almost started to oscillate and it can't go down. So >>>> we stopped training and exported the traineddata model. >>>> >>>> After testing the exported traineddata model, the accuracy is not >>>> satisfactory enough, which is lower than the model provided by the >>>> official >>>> website (tesseract github website). >>>> >>>> We hope that the training model recognition accuracy will be consistent >>>> with the official website. Then how can we continue to further improve the >>>> accuracy of the model? >>>> >>>> Does anyone know the details of the official website training language >>>> model, such as the num of iteration, the lowest char error and word error, >>>> the value of the learning_rate, and so on? >>>> >>>> If you know these information, please give some tips. >>>> >>>> >>>> Thank you. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/54f6a114-54c3-462b-a6f0-11d6ca81f6c4%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/54f6a114-54c3-462b-a6f0-11d6ca81f6c4%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/13d485ee-a59b-487c-adfe-efc3af123855%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

