Sinhala script Sorry about the wrong autocorrect on phone
On Sun, 30 Sep 2018, 19:33 Shree Devi Kumar, <[email protected]> wrote: > Looks like your langdata dir does not have the script unicharset files for > Signals and Latin scripts. > > Failed to load script unicharset from:../training/Latin.unicharset > > Failed to load script unicharset from:../training/Sinhala.unicharset > > > > On Sun, 30 Sep 2018, 18:27 Shandigutt, <[email protected]> wrote: > >> Hi, >> >> I attempted to create training data using the below command, >> >> ./src/training/tesstrain.sh --fonts_dir ../Support/font --lang sin >> --linedata_only \ >> --noextract_font_properties --langdata_dir ../training \ >> --tessdata_dir ../tessdata_best --output_dir ../training/sintrain >> --fontlist "BhashitaComplex" --training_text ../training/sin/sin.training_ >> text >> >> >> I could capture only a part of the log output. Highlights are extracted >> below, >> >> Word started with a combiner:0xddc >> >> Normalization failed for string 'ො' >> >> Word started with a combiner:0xdca >> >> Word started with a combiner:0x200d >> >> Normalization failed for string '්ය' >> >> Word started with a combiner:0xdcf >> >> Normalization failed for string 'ා' >> >> >> Wrote unicharset file /tmp/sin-2018-09-29.aN0/sin.unicharset >> >> [Sat Sep 29 21:33:19 UTC 2018] /usr/local/bin/set_unicharset_properties >> -U /tmp/sin-2018-09-29.aN0/sin.unicharset -O >> /tmp/sin-2018-09-29.aN0/sin.unicharset >> -X /tmp/sin-2018-09-29.aN0/sin.xheights --script_dir=../training >> >> Loaded unicharset of size 114 from file /tmp/sin-2018-09-29.aN0/sin. >> unicharset >> >> Setting unichar properties >> >> Setting script properties >> >> Failed to load script unicharset from:../training/Latin.unicharset >> >> Failed to load script unicharset from:../training/Sinhala.unicharset >> >> Warning: properties incomplete for index 3 = ස >> >> Warning: properties incomplete for index 4 = ී >> >> Warning: properties incomplete for index 5 = ග >> >> >> === Constructing LSTM training data === >> >> Creating new directory ../training/sintrain >> >> [Sun Sep 30 05:32:18 UTC 2018] /usr/local/bin/combine_lang_model >> --input_unicharset /tmp/sin-2018-09-29.aN0/sin.unicharset --script_dir >> ../training --words ../training/sin/sin.wordlist --numbers >> ../training/sin/sin.numbers --puncs ../training/sin/sin.punc --output_dir >> ../training/sintrain --lang sin --pass_through_recoder >> >> Loaded unicharset of size 114 from file /tmp/sin-2018-09-29.aN0/sin. >> unicharset >> >> Setting unichar properties >> >> Setting script properties >> >> Failed to load script unicharset from:../training/Latin.unicharset >> >> Failed to load script unicharset from:../training/Sinhala.unicharset >> >> Warning: properties incomplete for index 3 = ස >> >> Warning: properties incomplete for index 4 = ී >> >> Warning: properties incomplete for index 5 = ග >> >> >> >> Warning: properties incomplete for index 112 = ෴ >> >> Warning: properties incomplete for index 113 = ෲ >> >> Config file is optional, continuing... >> >> Failed to read data from: ../training/sin/sin.config >> >> Failed to read data from: ../training/radical-stroke.txt >> >> Error reading radical code table ../training/radical-stroke.txt >> >> >> === Moving lstmf files for training data === >> >> Moving /tmp/sin-2018-09-29.aN0/sin.BhashitaComplex.exp0.lstmf to >> ../training/sintrain >> >> >> Created starter traineddata for language 'sin' >> >> >> >> Run lstmtraining to do the LSTM training for language 'sin' >> >> >> For the full capture of the log please find the attached file >> >> Tesseract version I use, >> >> tesseract --version >> >> tesseract 4.0.0-beta.4-158-g02f9d >> >> leptonica-1.77.0 >> >> libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib >> 1.2.11 >> >> Found AVX512BW >> >> Found AVX512F >> >> Found AVX2 >> >> Found AVX >> >> Found SSE >> >> >> OS details, >> >> Linux ip-172-31-13-179 4.15.0-1021-aws #21-Ubuntu SMP Tue Aug 28 10:23:07 >> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >> >> >> Please let me know what has gone wrong. >> >> Thanks >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ >> msgid/tesseract-ocr/590c5444-0006-4816-baf1-35042d443d31% >> 40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/590c5444-0006-4816-baf1-35042d443d31%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU-OsBgFrY6ZS68YN71o%2Bwe%3Dxso%2BFbgno2_515w3%2BYUMA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

