Adding more details to my query, *My tesseract version:* tesseract 4.0.0-beta.4-74-gd8237 leptonica-1.77.0 libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 Found SSE
*My OS details,* tharaka@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.1 LTS Release: 18.04 Codename: bionic Thanks On Tuesday, September 4, 2018 at 12:11:50 AM UTC+3, Shandigutt wrote: > > Hi, > > I'm currently in the process of training Tesseract for new language. I'm > currently following Tesseract wiki training guidelines > <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00>. > > Once I build Tesseract from source and installed, I first created my own > langdata set. > > Then I crated training data and eval data using tesstrain.sh script. > > Then I tried to create a starter traineddata file using combine_lang_model > script. I used the below command for that, > > *./build/src/training/combine_lang_model --input_unicharset > ../training/sintrain/sin/sin.unicharset --script_dir ../langdata --words > ../langdata/sin/sin.wordlist --puncs ../langdata/sin/sin.punc --numbers > ../langdata/sin/sin.numbers --output_dir ../training/combined_sin > --version_str 1.0 --lang sin* > > When executing the above command I referred the langdata I created on my > own for words list, punctuations and numbers. Also I referred the > unicharset file that was created when creating training data. But I got the > following error output, > > *Loaded unicharset of size 90 from file > ../training/sintrain/sin/sin.unicharset* > *Setting unichar properties* > *Setting script properties* > *Warning: properties incomplete for index 4 = ී* > *Warning: properties incomplete for index 6 = ි* > *Warning: properties incomplete for index 11 = ු* > *Warning: properties incomplete for index 15 = ්* > *Warning: properties incomplete for index 30 = ූ* > *Warning: properties incomplete for index 44 = ්ර* > *Warning: properties incomplete for index 79 = ්ය* > *Warning: properties incomplete for index 82 = ක්* > *Warning: properties incomplete for index 89 = ර්* > *Error writing unicharset!!* > > Can somebody assist me on this. > > Thanks > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/71472620-135e-4777-8913-688e95fb9be3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

