You are using an old version of tesseract. Please use the latest version from github.
Make sure you remove/uninstall old version. You error is related to radical stroke file in langdata. Make sure you use latest version of langdata repo. >Invalid format in radical table at line 4: 3400 1.4 On Mon 6 Aug, 2018, 9:41 AM Shandigutt, <[email protected]> wrote: > Hi, > > I am trying to train Tesseract for Sinhala language. I was following training > guidelines > <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata> > mentioned in Github wiki. I get an error with reference to the 4th step > which is "Creating Starter Traineddata". Please find the below command I > executed, > > training/combine_lang_model --input_unicharset > ../training/sin/sin.unicharset --script_dir ../langdata --words > ../langdata/sin/sin.wordlist --puncs ../langdata/sin/sin.punc --numbers > ../langdata/sin/sin.numbers --output_dir ../training/combined_sin > --version_str 1.0 --lang sin > > I get the following output, > > Loaded unicharset of size 94 from file ../training/sin/sin.unicharset > Setting unichar properties > Setting script properties > Warning: properties incomplete for index 4 = ී > Warning: properties incomplete for index 6 = ි > Warning: properties incomplete for index 11 = ු > Warning: properties incomplete for index 15 = ් > Warning: properties incomplete for index 33 = ූ > Warning: properties incomplete for index 52 = ්ර > Warning: properties incomplete for index 56 = ්ය > Warning: properties incomplete for index 87 = ක් > Warning: properties incomplete for index 93 = ර් > Config file is optional, continuing... > Null char=2 > Invalid format in radical table at line 4: 3400 1.4 > Creation of encoded unicharset failed!! > Error writing recoder!! > Reducing Trie to SquishedDawg > Reducing Trie to SquishedDawg > Reducing Trie to SquishedDawg > > For more information I have attached my sin.unicharset file and sin.config > files. > > I use below Tesseract version, > > tesseract -v > tesseract 4.00.00dev-696-geba0ae3 > leptonica-1.74.4 > libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib > 1.2.8 > > Found SSE > > I use below OS, > > uname -a > Linux shandigutt-laptop-ubuntu 4.4.0-130-generic #156-Ubuntu SMP Thu Jun > 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > Appreciate if somebody can please help me on this. > > Thannks > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/84872636-f425-4cc0-b228-00e7a3f5b6a3%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/84872636-f425-4cc0-b228-00e7a3f5b6a3%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWEE%2B%3DrL8PTB-yWvECmSHHkJ%3DTXjOin%3DzHkK6FDHR87iA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

