You are using an old version of tesseract. Please use the latest version
from github.

Make sure you remove/uninstall old version.

You error is related to radical stroke file in langdata. Make sure you use
latest version of langdata repo.

>Invalid format in radical table at line 4: 3400    1.4

On Mon 6 Aug, 2018, 9:41 AM Shandigutt, <[email protected]> wrote:

> Hi,
>
> I am trying to train Tesseract for Sinhala language. I was following training
> guidelines
> <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata>
> mentioned in Github wiki. I get an error with reference to the 4th step
> which is "Creating Starter Traineddata". Please find the below command I
> executed,
>
> training/combine_lang_model --input_unicharset
> ../training/sin/sin.unicharset --script_dir ../langdata --words
> ../langdata/sin/sin.wordlist --puncs ../langdata/sin/sin.punc --numbers
> ../langdata/sin/sin.numbers --output_dir ../training/combined_sin
> --version_str 1.0 --lang sin
>
> I get the following output,
>
> Loaded unicharset of size 94 from file ../training/sin/sin.unicharset
> Setting unichar properties
> Setting script properties
> Warning: properties incomplete for index 4 = ී
> Warning: properties incomplete for index 6 = ි
> Warning: properties incomplete for index 11 = ු
> Warning: properties incomplete for index 15 = ්‌
> Warning: properties incomplete for index 33 = ූ
> Warning: properties incomplete for index 52 = ්‍ර
> Warning: properties incomplete for index 56 = ්‍ය
> Warning: properties incomplete for index 87 = ක්‍
> Warning: properties incomplete for index 93 = ර්‍
> Config file is optional, continuing...
> Null char=2
> Invalid format in radical table at line 4: 3400    1.4
> Creation of encoded unicharset failed!!
> Error writing recoder!!
> Reducing Trie to SquishedDawg
> Reducing Trie to SquishedDawg
> Reducing Trie to SquishedDawg
>
> For more information I have attached my sin.unicharset file and sin.config
> files.
>
> I use below Tesseract version,
>
> tesseract -v
> tesseract 4.00.00dev-696-geba0ae3
>  leptonica-1.74.4
>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
> 1.2.8
>
>  Found SSE
>
> I use below OS,
>
> uname -a
> Linux shandigutt-laptop-ubuntu 4.4.0-130-generic #156-Ubuntu SMP Thu Jun
> 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> Appreciate if somebody can please help me on this.
>
> Thannks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/84872636-f425-4cc0-b228-00e7a3f5b6a3%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/84872636-f425-4cc0-b228-00e7a3f5b6a3%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWEE%2B%3DrL8PTB-yWvECmSHHkJ%3DTXjOin%3DzHkK6FDHR87iA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to