Hi,

I'm currently in the process of training Tesseract for new language. I'm 
currently following Tesseract wiki training guidelines 
<https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00>.

Once I build Tesseract from source and installed, I first created my own 
langdata set. 

Then I crated training data and eval data using tesstrain.sh script.

Then I tried to create a starter traineddata file using combine_lang_model 
script. I used the below command for that,

*./build/src/training/combine_lang_model --input_unicharset 
../training/sintrain/sin/sin.unicharset --script_dir ../langdata --words 
../langdata/sin/sin.wordlist --puncs ../langdata/sin/sin.punc --numbers 
../langdata/sin/sin.numbers --output_dir ../training/combined_sin 
--version_str 1.0 --lang sin*

When executing the above command I referred the langdata I created on my 
own for words list, punctuations and numbers. Also I referred the 
unicharset file that was created when creating training data. But I got the 
following error output,

*Loaded unicharset of size 90 from file 
../training/sintrain/sin/sin.unicharset*
*Setting unichar properties*
*Setting script properties*
*Warning: properties incomplete for index 4 = ී*
*Warning: properties incomplete for index 6 = ි*
*Warning: properties incomplete for index 11 = ු*
*Warning: properties incomplete for index 15 = ්‌*
*Warning: properties incomplete for index 30 = ූ*
*Warning: properties incomplete for index 44 = ්‍ර*
*Warning: properties incomplete for index 79 = ්‍ය*
*Warning: properties incomplete for index 82 = ක්‍*
*Warning: properties incomplete for index 89 = ර්‍*
*Error writing unicharset!!*

Can somebody assist me on this.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/fcc70add-d035-4d3e-8042-6389d37febb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to