Hi, I'm currently in the process of training Tesseract for new language. I'm currently following Tesseract wiki training guidelines <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00>.
Once I build Tesseract from source and installed, I first created my own langdata set. Then I crated training data and eval data using tesstrain.sh script. Then I tried to create a starter traineddata file using combine_lang_model script. I used the below command for that, *./build/src/training/combine_lang_model --input_unicharset ../training/sintrain/sin/sin.unicharset --script_dir ../langdata --words ../langdata/sin/sin.wordlist --puncs ../langdata/sin/sin.punc --numbers ../langdata/sin/sin.numbers --output_dir ../training/combined_sin --version_str 1.0 --lang sin* When executing the above command I referred the langdata I created on my own for words list, punctuations and numbers. Also I referred the unicharset file that was created when creating training data. But I got the following error output, *Loaded unicharset of size 90 from file ../training/sintrain/sin/sin.unicharset* *Setting unichar properties* *Setting script properties* *Warning: properties incomplete for index 4 = ී* *Warning: properties incomplete for index 6 = ි* *Warning: properties incomplete for index 11 = ු* *Warning: properties incomplete for index 15 = ්* *Warning: properties incomplete for index 30 = ූ* *Warning: properties incomplete for index 44 = ්ර* *Warning: properties incomplete for index 79 = ්ය* *Warning: properties incomplete for index 82 = ක්* *Warning: properties incomplete for index 89 = ර්* *Error writing unicharset!!* Can somebody assist me on this. Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fcc70add-d035-4d3e-8042-6389d37febb3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

