and how many lines are the training_text is better , the total number of my character is no more than 100.
易鑫 <[email protected]> 于2019年3月26日周二 上午9:50写道: > okay.Thank you very much. > But does 36000 iterations overfit will happen? > > Shree Devi Kumar <[email protected]> 于2019年3月25日周一 下午11:43写道: > >> 36000 iterations, error rate 0.1 >> >> OCR output attached >> >> >> On Mon, Mar 25, 2019 at 6:09 PM Shree Devi Kumar <[email protected]> >> wrote: >> >>> Try replacing a layer - you may need larger training_text and more >>> iterations >>> >>> lstmtraining --model_output >>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_layer \ >>> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \ >>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ >>> --append_index 5 --net_spec '[Lfx192 O1c1]' \ >>> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt >>> \ >>> --max_iterations 30000 >>> >>> On Mon, Mar 25, 2019 at 4:14 PM 易鑫 <[email protected]> wrote: >>> >>>> Hello,everyone: >>>> I have focus the training eng + chi_sim for several days,but one >>>> urgent issue confused me. I have ask the questions before,but do not get >>>> good reply,so I ask the questions again. Sorry for disturbing you. >>>> >>>> My steps is as follows: >>>> >>>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text >>>> ../training_data/chi_sim_tuned.txt \ >>>> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim >>>> --linedata_only --noextract_font_properties --exposures "0" \ >>>> --workspace_dir ./share/workspace/tmp \ >>>> --save_box_tiff \ >>>> --fontlist "NSimSun" \ >>>> "Times New Roman" \ >>>> "Arial Unicode MS" \ >>>> "SimSun" \ >>>> "Merchant Copy" \ >>>> "Merchant Copy Doublesize" \ >>>> "Noto Sans CJK SC" \ >>>> "Noto Sans Mono CJK SC" \ >>>> --output_dir ~/tesstutorial/chi_sim_train \ >>>> --overwrite >>>> >>>> >>>> mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim >>>> >>>> >>>> >>>> combine_tessdata -e ../tessdata_best/chi_sim.traineddata >>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm >>>> >>>> >>>> lstmtraining --model_output >>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \ >>>> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \ >>>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ >>>> --old_traineddata ../tessdata_best/chi_sim.traineddata \ >>>> --train_listfile >>>> ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \ >>>> --max_iterations 3000 >>>> >>>> lstmtraining --stop_training --continue_from >>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint \ >>>> --traineddata >>>> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output >>>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata >>>> >>>> the train_text file is in the attachfile. >>>> >>>> >>>> What confused me is that: the result contains some characters that do >>>> not in the train_text file.(only chi_sim character have the >>>> problem,eng is ok)。 >>>> >>>> Can anyone help me?Thanks a lot. >>>> I also upload image and result file. Thanks in advance. >>>> >>>> Thank you. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/4af9e1d1-218a-4a36-8a77-1b4619b53205%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVf_O9OLeLLyk3yjtiRYFnDyA8rj0AYtk1m6MtVMDWwAg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE2387Fy_FOZ7%2BbcF9eTW_A5npXK-kLcGVPzTbt0_7s5BQA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

