Adding more details to my query,

*My tesseract  version:*
tesseract 4.0.0-beta.4-74-gd8237
 leptonica-1.77.0
  libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 
1.2.11
 Found SSE

*My OS details,*
tharaka@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic

Thanks

On Tuesday, September 4, 2018 at 12:11:50 AM UTC+3, Shandigutt wrote:
>
> Hi,
>
> I'm currently in the process of training Tesseract for new language. I'm 
> currently following Tesseract wiki training guidelines 
> <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00>.
>
> Once I build Tesseract from source and installed, I first created my own 
> langdata set. 
>
> Then I crated training data and eval data using tesstrain.sh script.
>
> Then I tried to create a starter traineddata file using combine_lang_model 
> script. I used the below command for that,
>
> *./build/src/training/combine_lang_model --input_unicharset 
> ../training/sintrain/sin/sin.unicharset --script_dir ../langdata --words 
> ../langdata/sin/sin.wordlist --puncs ../langdata/sin/sin.punc --numbers 
> ../langdata/sin/sin.numbers --output_dir ../training/combined_sin 
> --version_str 1.0 --lang sin*
>
> When executing the above command I referred the langdata I created on my 
> own for words list, punctuations and numbers. Also I referred the 
> unicharset file that was created when creating training data. But I got the 
> following error output,
>
> *Loaded unicharset of size 90 from file 
> ../training/sintrain/sin/sin.unicharset*
> *Setting unichar properties*
> *Setting script properties*
> *Warning: properties incomplete for index 4 = ී*
> *Warning: properties incomplete for index 6 = ි*
> *Warning: properties incomplete for index 11 = ු*
> *Warning: properties incomplete for index 15 = ්‌*
> *Warning: properties incomplete for index 30 = ූ*
> *Warning: properties incomplete for index 44 = ්‍ර*
> *Warning: properties incomplete for index 79 = ්‍ය*
> *Warning: properties incomplete for index 82 = ක්‍*
> *Warning: properties incomplete for index 89 = ර්‍*
> *Error writing unicharset!!*
>
> Can somebody assist me on this.
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/71472620-135e-4777-8913-688e95fb9be3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to