thanks for your advice,I will try.

Shree Devi Kumar <[email protected]> 于2019年3月19日周二 下午10:01写道:

> You are using a number of Japanese, Koean and Traditional Chinese fonts
> for training. Try without them.
>
> On Tue, Mar 19, 2019 at 4:19 PM 易鑫 <[email protected]> wrote:
>
>> Hello,everyone:
>>     I want to recognize the characters in the table(You can see find it
>> in the attach file).In the past, I only recognize the english letters,and
>> the result is pretty good,but now I want to recognize
>> english letters plus Chinese characters. So I retrained the model. here
>> is my command:
>>
>> 1)src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text
>> ../training_data/chi_sim_tuned.txt \
>> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim
>> --linedata_only --noextract_font_properties  --exposures "0" \
>> --fontlist  "AR PL UKai CN" \
>>   "AR PL UKai HK" \
>>   "AR PL UKai TW" \
>>   "AR PL UKai TW MBE" \
>>   "AR PL UMing CN Light" \
>>   "AR PL UMing HK Light" \
>>   "AR PL UMing TW Light" \
>>   "AR PL UMing TW MBE Light" \
>>   "NSimSun" \
>>   "Noto Sans CJK JP" \
>>   "Noto Sans CJK JP Bold" \
>>   "Noto Sans CJK JP Heavy" \
>>   "Noto Sans CJK JP Light" \
>>   "Noto Sans CJK JP Medium" \
>>   "Noto Sans CJK JP Semi-Light" \
>>   "Noto Sans CJK JP Ultra-Light" \
>>   "Noto Sans CJK KR" \
>>   "Noto Sans CJK KR Bold" \
>>   "Noto Sans CJK KR Heavy" \
>>   "Noto Sans CJK KR Light" \
>>   "Noto Sans CJK KR Medium" \
>>   "Noto Sans CJK KR Semi-Light" \
>>   "Noto Sans CJK KR Ultra-Light" \
>>   "Noto Sans CJK SC" \
>>   "Noto Sans CJK SC Bold" \
>>   "Noto Sans CJK SC Heavy" \
>>   "Noto Sans CJK SC Light" \
>>   "Noto Sans CJK SC Medium" \
>>   "Noto Sans CJK SC Semi-Light" \
>>   "Noto Sans CJK SC Ultra-Light" \
>>   "Noto Sans CJK TC" \
>>   "Noto Sans CJK TC Bold" \
>>   "Noto Sans CJK TC Heavy" \
>>   "Noto Sans CJK TC Light" \
>>   "Noto Sans CJK TC Medium" \
>>   "Noto Sans CJK TC Semi-Light" \
>>   "Noto Sans CJK TC Ultra-Light" \
>>   "Noto Sans Mono CJK JP" \
>>   "Noto Sans Mono CJK JP Bold" \
>>   "Noto Sans Mono CJK KR" \
>>   "Noto Sans Mono CJK KR Bold" \
>>   "Noto Sans Mono CJK SC" \
>>   "Noto Sans Mono CJK SC Bold" \
>>   "Noto Sans Mono CJK TC" \
>>   "Noto Sans Mono CJK TC Bold" \
>>   "SimSun" \
>>   "WenQuanYi Zen Hei Medium" \
>>   "WenQuanYi Zen Hei Mono Medium" \
>> --output_dir ~/tesstutorial/chi_sim_train
>>
>> 2)mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim
>> 3)combine_tessdata -e ../tessdata_best/chi_sim.traineddata
>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm
>> 4)lstmtraining --model_output
>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \
>> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
>> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
>> --old_traineddata ../tessdata_best/chi_sim.traineddata \
>> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
>> --max_iterations 10000
>> 5)lstmtraining --stop_training --continue_from
>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint  \
>>            --traineddata
>> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output
>> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata
>>
>> The result is not good, most strange is that* the result contains some
>> Chinese characters that do not exist in the training_text file*, I
>> really can not understand,
>> can some one help me,thanks a lot.
>>
>> The training_text file and the result are also in the attach file.
>>
>> Sorry for my poor english.
>>
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/bd740b98-3c0c-4216-88ba-0eb72cdcf3ee%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/bd740b98-3c0c-4216-88ba-0eb72cdcf3ee%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX-v7GRu5-mZuSQACY%3D%3DG%2BJj10ywqAqwBvBaKmA-E4Jng%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX-v7GRu5-mZuSQACY%3D%3DG%2BJj10ywqAqwBvBaKmA-E4Jng%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE21thyjiMH3ySX57hPUeNfAa1iPu8Vrfj-SHUmVXod1_Zw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to