jpn.config in langdata/jpn is loading jpn_vert as a sublanguage
tessedit_load_sublangs jpn_vert
You can try without that
Also look at the settings for jpn in training/language_specific.sh
You may need to change the following also ..
# The following fonts will be rendered vertically in phase I.
VERTICAL_FONTS=( \
"TakaoExGothic" \ # for jpn
"TakaoExMincho" \ # for jpn
"AR PL UKai Patched" \ # for chi_tra
"AR PL UMing Patched Light" \ # for chi_tra
"Baekmuk Batang Patched" \ # for kor
)
ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Apr 3, 2017 at 4:22 PM, <[email protected]> wrote:
> Hi,
>
> I'm trying to creating training data for Japanese (jpn.traineddata).
>
> I run 'tesstrain.sh' with '--linedataonly' option, and the script has
> finished ( return code 0 ) .
> But log file has contained some error messages ( repeated 22 times ).
>
> ```
> $ ../tesseract-ocr/training/tesstrain.sh --fonts_dir /usr/share/fonts
> --lang jpn --linedata_only --noextract_font_properties --langdata_dir
> ../langdata --tessdata_dir /usr/local/share --output_dir ~/work/jpntrain
> ```
>
>
> ---
> [Sun Apr 2 07:42:30 UTC 2017] /usr/local/bin/tesseract
> /tmp/tmp.pwcwGMb5hs/jpn/jpn.IPAPMincho.exp0.tif
> /tmp/tmp.pwcwGMb5hs/jpn/jpn.
> IPAPMincho.exp0 lstm.train ../langdata/jpn/jpn.config
> [Sun Apr 2 07:42:30 UTC 2017] /usr/local/bin/tesseract
> /tmp/tmp.pwcwGMb5hs/jpn/jpn.IPAGothic.exp0.tif
> /tmp/tmp.pwcwGMb5hs/jpn/jpn.I
> PAGothic.exp0 lstm.train ../langdata/jpn/jpn.config
> Error opening data file /usr/local/share/tessdata/jpn_vert.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to the
> parent directory of your "tessdata" directory.
> Failed loading language 'jpn_vert'
> ---
>
> It seems that 'tesstrain.sh' requires 'jpn_vert.traineddata`, but this
> file not provide on tessdata repository.
>
> How I get this file? Or, Can I substitute 'jpn.traineddata' for
> 'jpn_vert.traineddata' ?
>
>
> I've found that there is `jpn_vert' directory on langdata repository, but
> only some config files.
>
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/c776398d-0b2f-483d-a9ec-63476eaf0586%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/c776398d-0b2f-483d-a9ec-63476eaf0586%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUXiMCsyMXtaV-mBiq1E1OhJqV-obaMHLkizjnivUMtiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.