Hello, I think you miss the "fontlist" argument... The below script worked out for Japanese. Even though you want to train all fonts in language-specific.sh, I would suggest to include the "fontlist" argument still.
tesstrain.sh \ --fonts_dir /usr/share/fonts/ \ --lang jpn \ --linedata_only \ --noextract_font_properties \ --langdata_dir /langdata \ --tessdata_dir /tessdata \ --output_dir ~/tesstutorial/horizon \ --fontlist "TakaoExGothic" "TakaoExMincho" Please keep in mind that the fonts that you want to use should be in language-specific.sh too. Also, you may want to look at VERTICAL_FONT section to avoid the situation where the sentences are aligned vertically, which will be needed in Japanese or Chinese, but not in Korean. On Wed, Dec 5, 2018 at 3:08 AM Zdenko Podobny <zde...@gmail.com> wrote: > Do you use scripts from master repository? There where some updates after > 4.0 release... > > Zdenko > > > st 5. 12. 2018 o 8:19 SEUNGGWANSHIN <tmdrhsl...@gmail.com> napísal(a): > >> hello guys >> >> i'm training tesseract-lstm with >> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >> i have some problem using "tesstrain.sh" >> >> When creating train data, this website used tesstrain.sh this way. >> >> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng >> --linedata_only \ >> >> --noextract_font_properties --langdata_dir ../langdata \ >> >> --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain >> >> >> So my code is below. >> >> tesstrain.sh --fonts_dir /usr/share/fonts \ >> >> --lang kor \ >> >> --linedata_only \ >> >> --noextract_font_properties \ >> >> --langdata_dir ../langdata-master \ >> >> --tessdata_dir tessdata/tessdata_fast/ \ >> >> --output_dir kortrain >> >> My language is* "kor" *not "eng" ... >> when i executed those script, i got unknown error like this. >> >> === Starting training for language 'kor' >> >> /usr/local/bin/language-specific.sh: 줄 1125: FONTS: unbound variable >> >> >> and i checked this error line in language-specific.sh. >> >> 1124 kor ) MEAN_COUNT="20" >> >> 1125 WORD_DAWG_FACTOR=0.015 >> >> 1126 NUMBER_DAWG_FACTOR=0.05 >> >> 1127 TRAINING_DATA_ARGUMENTS+=" --infrequent_ratio=10000" >> >> 1128 TRAINING_DATA_ARGUMENTS+=" --desired_bigrams=" >> >> 1129 GENERATE_WORD_BIGRAMS=0 >> >> 1130 FILTER_ARGUMENTS="--charset_filter=kor >> --segmenter_lang=kor" >> >> 1131 test -z "$FONTS" && FONTS=( "${KOREAN_FONTS[@]}" ) ;; >> >> 312 KOREAN_FONTS=( \ >> >> 313 "Arial Unicode MS" \ >> >> 314 "Arial Unicode MS Bold" \ >> >> 315 "Baekmuk Batang Patched" \ >> >> 316 "Baekmuk Batang" \ >> >> 317 "Baekmuk Dotum" \ >> >> 318 "Baekmuk Gulim" \ >> >> 319 "Baekmuk Headline" \ >> >> 320 ) >> >> I installed perfectly korean_fonts using ttf_mscorefonts_installer, etc.. >> but i dont know why this error happens.. >> >> Anyone help me ! >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2b7bbf45-4240-411b-bd4a-87c46fdcea5a%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/2b7bbf45-4240-411b-bd4a-87c46fdcea5a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xdwe36ORbpvjm0s79zQhNE%2BNFmgsa1c4%2B_N1yfROtBdQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xdwe36ORbpvjm0s79zQhNE%2BNFmgsa1c4%2B_N1yfROtBdQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2BVWkA6-H10-pdoetCcS1KhS03QopjKrrVfhHYezjRxd0LbP1Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.