Looks like your langdata dir does not have the script unicharset files for
Signals and Latin scripts.

Failed to load script unicharset from:../training/Latin.unicharset

Failed to load script unicharset from:../training/Sinhala.unicharset



On Sun, 30 Sep 2018, 18:27 Shandigutt, <[email protected]> wrote:

> Hi,
>
> I attempted to create training data using the below command,
>
> ./src/training/tesstrain.sh --fonts_dir ../Support/font --lang sin
> --linedata_only \
>   --noextract_font_properties --langdata_dir ../training \
>   --tessdata_dir ../tessdata_best --output_dir ../training/sintrain
> --fontlist "BhashitaComplex" --training_text
> ../training/sin/sin.training_text
>
>
> I could capture only a part of the log output. Highlights are extracted
> below,
>
> Word started with a combiner:0xddc
>
> Normalization failed for string 'ො'
>
> Word started with a combiner:0xdca
>
> Word started with a combiner:0x200d
>
> Normalization failed for string '්‍ය'
>
> Word started with a combiner:0xdcf
>
> Normalization failed for string 'ා'
>
>
> Wrote unicharset file /tmp/sin-2018-09-29.aN0/sin.unicharset
>
> [Sat Sep 29 21:33:19 UTC 2018] /usr/local/bin/set_unicharset_properties -U
> /tmp/sin-2018-09-29.aN0/sin.unicharset -O
> /tmp/sin-2018-09-29.aN0/sin.unicharset -X
> /tmp/sin-2018-09-29.aN0/sin.xheights --script_dir=../training
>
> Loaded unicharset of size 114 from file
> /tmp/sin-2018-09-29.aN0/sin.unicharset
>
> Setting unichar properties
>
> Setting script properties
>
> Failed to load script unicharset from:../training/Latin.unicharset
>
> Failed to load script unicharset from:../training/Sinhala.unicharset
>
> Warning: properties incomplete for index 3 = ස
>
> Warning: properties incomplete for index 4 = ී
>
> Warning: properties incomplete for index 5 = ග
>
>
> === Constructing LSTM training data ===
>
> Creating new directory ../training/sintrain
>
> [Sun Sep 30 05:32:18 UTC 2018] /usr/local/bin/combine_lang_model
> --input_unicharset /tmp/sin-2018-09-29.aN0/sin.unicharset --script_dir
> ../training --words ../training/sin/sin.wordlist --numbers
> ../training/sin/sin.numbers --puncs ../training/sin/sin.punc --output_dir
> ../training/sintrain --lang sin --pass_through_recoder
>
> Loaded unicharset of size 114 from file
> /tmp/sin-2018-09-29.aN0/sin.unicharset
>
> Setting unichar properties
>
> Setting script properties
>
> Failed to load script unicharset from:../training/Latin.unicharset
>
> Failed to load script unicharset from:../training/Sinhala.unicharset
>
> Warning: properties incomplete for index 3 = ස
>
> Warning: properties incomplete for index 4 = ී
>
> Warning: properties incomplete for index 5 = ග
>
>
>
> Warning: properties incomplete for index 112 = ෴
>
> Warning: properties incomplete for index 113 = ෲ
>
> Config file is optional, continuing...
>
> Failed to read data from: ../training/sin/sin.config
>
> Failed to read data from: ../training/radical-stroke.txt
>
> Error reading radical code table ../training/radical-stroke.txt
>
>
> === Moving lstmf files for training data ===
>
> Moving /tmp/sin-2018-09-29.aN0/sin.BhashitaComplex.exp0.lstmf to
> ../training/sintrain
>
>
> Created starter traineddata for language 'sin'
>
>
>
> Run lstmtraining to do the LSTM training for language 'sin'
>
>
> For the full capture of the log please find the attached file
>
> Tesseract version I use,
>
> tesseract --version
>
> tesseract 4.0.0-beta.4-158-g02f9d
>
>  leptonica-1.77.0
>
>   libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib
> 1.2.11
>
>  Found AVX512BW
>
>  Found AVX512F
>
>  Found AVX2
>
>  Found AVX
>
>  Found SSE
>
>
> OS details,
>
> Linux ip-172-31-13-179 4.15.0-1021-aws #21-Ubuntu SMP Tue Aug 28 10:23:07
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
>
> Please let me know what has gone wrong.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/590c5444-0006-4816-baf1-35042d443d31%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/590c5444-0006-4816-baf1-35042d443d31%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVZDCuP83n-k%3DNPKx14b%2Bu%3DBZFsnN6dXHODuMddc%3D7-KA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to