You are using a number of Japanese, Koean and Traditional Chinese fonts for
training. Try without them.

On Tue, Mar 19, 2019 at 4:19 PM 易鑫 <[email protected]> wrote:

> Hello,everyone:
>     I want to recognize the characters in the table(You can see find it in
> the attach file).In the past, I only recognize the english letters,and the
> result is pretty good,but now I want to recognize
> english letters plus Chinese characters. So I retrained the model. here is
> my command:
>
> 1)src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text
> ../training_data/chi_sim_tuned.txt \
> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim
> --linedata_only --noextract_font_properties  --exposures "0" \
> --fontlist  "AR PL UKai CN" \
>   "AR PL UKai HK" \
>   "AR PL UKai TW" \
>   "AR PL UKai TW MBE" \
>   "AR PL UMing CN Light" \
>   "AR PL UMing HK Light" \
>   "AR PL UMing TW Light" \
>   "AR PL UMing TW MBE Light" \
>   "NSimSun" \
>   "Noto Sans CJK JP" \
>   "Noto Sans CJK JP Bold" \
>   "Noto Sans CJK JP Heavy" \
>   "Noto Sans CJK JP Light" \
>   "Noto Sans CJK JP Medium" \
>   "Noto Sans CJK JP Semi-Light" \
>   "Noto Sans CJK JP Ultra-Light" \
>   "Noto Sans CJK KR" \
>   "Noto Sans CJK KR Bold" \
>   "Noto Sans CJK KR Heavy" \
>   "Noto Sans CJK KR Light" \
>   "Noto Sans CJK KR Medium" \
>   "Noto Sans CJK KR Semi-Light" \
>   "Noto Sans CJK KR Ultra-Light" \
>   "Noto Sans CJK SC" \
>   "Noto Sans CJK SC Bold" \
>   "Noto Sans CJK SC Heavy" \
>   "Noto Sans CJK SC Light" \
>   "Noto Sans CJK SC Medium" \
>   "Noto Sans CJK SC Semi-Light" \
>   "Noto Sans CJK SC Ultra-Light" \
>   "Noto Sans CJK TC" \
>   "Noto Sans CJK TC Bold" \
>   "Noto Sans CJK TC Heavy" \
>   "Noto Sans CJK TC Light" \
>   "Noto Sans CJK TC Medium" \
>   "Noto Sans CJK TC Semi-Light" \
>   "Noto Sans CJK TC Ultra-Light" \
>   "Noto Sans Mono CJK JP" \
>   "Noto Sans Mono CJK JP Bold" \
>   "Noto Sans Mono CJK KR" \
>   "Noto Sans Mono CJK KR Bold" \
>   "Noto Sans Mono CJK SC" \
>   "Noto Sans Mono CJK SC Bold" \
>   "Noto Sans Mono CJK TC" \
>   "Noto Sans Mono CJK TC Bold" \
>   "SimSun" \
>   "WenQuanYi Zen Hei Medium" \
>   "WenQuanYi Zen Hei Mono Medium" \
> --output_dir ~/tesstutorial/chi_sim_train
>
> 2)mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim
> 3)combine_tessdata -e ../tessdata_best/chi_sim.traineddata
> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm
> 4)lstmtraining --model_output
> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \
> --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
> --traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
> --old_traineddata ../tessdata_best/chi_sim.traineddata \
> --train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
> --max_iterations 10000
> 5)lstmtraining --stop_training --continue_from
> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint  \
>            --traineddata
> ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output
> ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata
>
> The result is not good, most strange is that* the result contains some
> Chinese characters that do not exist in the training_text file*, I really
> can not understand,
> can some one help me,thanks a lot.
>
> The training_text file and the result are also in the attach file.
>
> Sorry for my poor english.
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/bd740b98-3c0c-4216-88ba-0eb72cdcf3ee%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/bd740b98-3c0c-4216-88ba-0eb72cdcf3ee%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX-v7GRu5-mZuSQACY%3D%3DG%2BJj10ywqAqwBvBaKmA-E4Jng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to