Sinhala script

Sorry about the wrong autocorrect on phone

On Sun, 30 Sep 2018, 19:33 Shree Devi Kumar, <[email protected]> wrote:

> Looks like your langdata dir does not have the script unicharset files for
> Signals and Latin scripts.
>
> Failed to load script unicharset from:../training/Latin.unicharset
>
> Failed to load script unicharset from:../training/Sinhala.unicharset
>
>
>
> On Sun, 30 Sep 2018, 18:27 Shandigutt, <[email protected]> wrote:
>
>> Hi,
>>
>> I attempted to create training data using the below command,
>>
>> ./src/training/tesstrain.sh --fonts_dir ../Support/font --lang sin
>> --linedata_only \
>>   --noextract_font_properties --langdata_dir ../training \
>>   --tessdata_dir ../tessdata_best --output_dir ../training/sintrain
>> --fontlist "BhashitaComplex" --training_text ../training/sin/sin.training_
>> text
>>
>>
>> I could capture only a part of the log output. Highlights are extracted
>> below,
>>
>> Word started with a combiner:0xddc
>>
>> Normalization failed for string 'ො'
>>
>> Word started with a combiner:0xdca
>>
>> Word started with a combiner:0x200d
>>
>> Normalization failed for string '්‍ය'
>>
>> Word started with a combiner:0xdcf
>>
>> Normalization failed for string 'ා'
>>
>>
>> Wrote unicharset file /tmp/sin-2018-09-29.aN0/sin.unicharset
>>
>> [Sat Sep 29 21:33:19 UTC 2018] /usr/local/bin/set_unicharset_properties
>> -U /tmp/sin-2018-09-29.aN0/sin.unicharset -O 
>> /tmp/sin-2018-09-29.aN0/sin.unicharset
>> -X /tmp/sin-2018-09-29.aN0/sin.xheights --script_dir=../training
>>
>> Loaded unicharset of size 114 from file /tmp/sin-2018-09-29.aN0/sin.
>> unicharset
>>
>> Setting unichar properties
>>
>> Setting script properties
>>
>> Failed to load script unicharset from:../training/Latin.unicharset
>>
>> Failed to load script unicharset from:../training/Sinhala.unicharset
>>
>> Warning: properties incomplete for index 3 = ස
>>
>> Warning: properties incomplete for index 4 = ී
>>
>> Warning: properties incomplete for index 5 = ග
>>
>>
>> === Constructing LSTM training data ===
>>
>> Creating new directory ../training/sintrain
>>
>> [Sun Sep 30 05:32:18 UTC 2018] /usr/local/bin/combine_lang_model
>> --input_unicharset /tmp/sin-2018-09-29.aN0/sin.unicharset --script_dir
>> ../training --words ../training/sin/sin.wordlist --numbers
>> ../training/sin/sin.numbers --puncs ../training/sin/sin.punc --output_dir
>> ../training/sintrain --lang sin --pass_through_recoder
>>
>> Loaded unicharset of size 114 from file /tmp/sin-2018-09-29.aN0/sin.
>> unicharset
>>
>> Setting unichar properties
>>
>> Setting script properties
>>
>> Failed to load script unicharset from:../training/Latin.unicharset
>>
>> Failed to load script unicharset from:../training/Sinhala.unicharset
>>
>> Warning: properties incomplete for index 3 = ස
>>
>> Warning: properties incomplete for index 4 = ී
>>
>> Warning: properties incomplete for index 5 = ග
>>
>>
>>
>> Warning: properties incomplete for index 112 = ෴
>>
>> Warning: properties incomplete for index 113 = ෲ
>>
>> Config file is optional, continuing...
>>
>> Failed to read data from: ../training/sin/sin.config
>>
>> Failed to read data from: ../training/radical-stroke.txt
>>
>> Error reading radical code table ../training/radical-stroke.txt
>>
>>
>> === Moving lstmf files for training data ===
>>
>> Moving /tmp/sin-2018-09-29.aN0/sin.BhashitaComplex.exp0.lstmf to
>> ../training/sintrain
>>
>>
>> Created starter traineddata for language 'sin'
>>
>>
>>
>> Run lstmtraining to do the LSTM training for language 'sin'
>>
>>
>> For the full capture of the log please find the attached file
>>
>> Tesseract version I use,
>>
>> tesseract --version
>>
>> tesseract 4.0.0-beta.4-158-g02f9d
>>
>>  leptonica-1.77.0
>>
>>   libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib
>> 1.2.11
>>
>>  Found AVX512BW
>>
>>  Found AVX512F
>>
>>  Found AVX2
>>
>>  Found AVX
>>
>>  Found SSE
>>
>>
>> OS details,
>>
>> Linux ip-172-31-13-179 4.15.0-1021-aws #21-Ubuntu SMP Tue Aug 28 10:23:07
>> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>> Please let me know what has gone wrong.
>>
>> Thanks
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/tesseract-ocr/590c5444-0006-4816-baf1-35042d443d31%
>> 40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/590c5444-0006-4816-baf1-35042d443d31%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU-OsBgFrY6ZS68YN71o%2Bwe%3Dxso%2BFbgno2_515w3%2BYUMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to