thanks for your advice,I will try. Shree Devi Kumar <[email protected]> 于2019年3月19日周二 下午10:01写道:
> You are using a number of Japanese, Koean and Traditional Chinese fonts > for training. Try without them. > > On Tue, Mar 19, 2019 at 4:19 PM 易鑫 <[email protected]> wrote: > >> Hello,everyone: >> I want to recognize the characters in the table(You can see find it >> in the attach file).In the past, I only recognize the english letters,and >> the result is pretty good,but now I want to recognize >> english letters plus Chinese characters. So I retrained the model. here >> is my command: >> >> 1)src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text >> ../training_data/chi_sim_tuned.txt \ >> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim >> --linedata_only --noextract_font_properties --exposures "0" \ >> --fontlist "AR PL UKai CN" \ >> "AR PL UKai HK" \ >> "AR PL UKai TW" \ >> "AR PL UKai TW MBE" \ >> "AR PL UMing CN Light" \ >> "AR PL UMing HK Light" \ >> "AR PL UMing TW Light" \ >> "AR PL UMing TW MBE Light" \ >> "NSimSun" \ >> "Noto Sans CJK JP" \ >> "Noto Sans CJK JP Bold" \ >> "Noto Sans CJK JP Heavy" \ >> "Noto Sans CJK JP Light" \ >> "Noto Sans CJK JP Medium" \ >> "Noto Sans CJK JP Semi-Light" \ >> "Noto Sans CJK JP Ultra-Light" \ >> "Noto Sans CJK KR" \ >> "Noto Sans CJK KR Bold" \ >> "Noto Sans CJK KR Heavy" \ >> "Noto Sans CJK KR Light" \ >> "Noto Sans CJK KR Medium" \ >> "Noto Sans CJK KR Semi-Light" \ >> "Noto Sans CJK KR Ultra-Light" \ >> "Noto Sans CJK SC" \ >> "Noto Sans CJK SC Bold" \ >> "Noto Sans CJK SC Heavy" \ >> "Noto Sans CJK SC Light" \ >> "Noto Sans CJK SC Medium" \ >> "Noto Sans CJK SC Semi-Light" \ >> "Noto Sans CJK SC Ultra-Light" \ >> "Noto Sans CJK TC" \ >> "Noto Sans CJK TC Bold" \ >> "Noto Sans CJK TC Heavy" \ >> "Noto Sans CJK TC Light" \ >> "Noto Sans CJK TC Medium" \ >> "Noto Sans CJK TC Semi-Light" \ >> "Noto Sans CJK TC Ultra-Light" \ >> "Noto Sans Mono CJK JP" \ >> "Noto Sans Mono CJK JP Bold" \ >> "Noto Sans Mono CJK KR" \ >> "Noto Sans Mono CJK KR Bold" \ >> "Noto Sans Mono CJK SC" \ >> "Noto Sans Mono CJK SC Bold" \ >> "Noto Sans Mono CJK TC" \ >> "Noto Sans Mono CJK TC Bold" \ >> "SimSun" \ >> "WenQuanYi Zen Hei Medium" \ >> "WenQuanYi Zen Hei Mono Medium" \ >> --output_dir ~/tesstutorial/chi_sim_train >> >> 2)mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim >> 3)combine_tessdata -e ../tessdata_best/chi_sim.traineddata >> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm >> 4)lstmtraining --model_output >> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \ >> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \ >> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \ >> --old_traineddata ../tessdata_best/chi_sim.traineddata \ >> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \ >> --max_iterations 10000 >> 5)lstmtraining --stop_training --continue_from >> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint \ >> --traineddata >> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output >> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata >> >> The result is not good, most strange is that* the result contains some >> Chinese characters that do not exist in the training_text file*, I >> really can not understand, >> can some one help me,thanks a lot. >> >> The training_text file and the result are also in the attach file. >> >> Sorry for my poor english. >> >> >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/bd740b98-3c0c-4216-88ba-0eb72cdcf3ee%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/bd740b98-3c0c-4216-88ba-0eb72cdcf3ee%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX-v7GRu5-mZuSQACY%3D%3DG%2BJj10ywqAqwBvBaKmA-E4Jng%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX-v7GRu5-mZuSQACY%3D%3DG%2BJj10ywqAqwBvBaKmA-E4Jng%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE21thyjiMH3ySX57hPUeNfAa1iPu8Vrfj-SHUmVXod1_Zw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

