Hi, *I was trying to run lstmtraining script using below command,*
./build/src/training/lstmtraining --debug_interval 100 \ --traineddata ../training/sintrain/sin/sin.traineddata \ --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ --model_output /media/shandigutt/UUI/training/base --learning_rate 20e-4 \ --train_listfile ../training/sintrain/sin.training_files.txt \ --eval_listfile ../training/sineval/sin.training_files.txt \ --max_iterations 5000 &> /media/shandigutt/UUI/training/basetrain.log *I got the following output,* Warning: given outputs 111 not equal to unicharset of 90. Num outputs,weights in Series: 1,36,0,1:1, 0 Num outputs,weights in Series: C3,3:9, 0 Ft16:16, 160 Total weights = 160 [C3,3Ft16]:16, 160 Mp3,3:16, 0 Lfys48:48, 12480 Lfx96:96, 55680 Lrx96:96, 74112 Lfx256:256, 361472 Fc90:90, 23130 Total weights = 527034 Built network:[1,36,0,1[C3,3Ft16]Mp3,3Lfys48Lfx96Lrx96Lfx256Fc90] from request [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111] Training parameters: Debug interval = 100, weights = 0.1, learning rate = 0.002, momentum=0.5 null char=2 Loaded 106/106 pages (1-106) of document ../training/sintrain/sin.BhashitaComplex.exp0.lstmf Loaded 106/106 pages (1-106) of document ../training/sineval/sin.BhashitaComplex.exp0.lstmf Encoding of string failed! Failure bytes: ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb6 ffffffaf 20 ffffffe0 ffffffb7 ffffff83 ffffffe0 ffffffb6 ffffff82 ffffffe0 ffffffb7 ffffff83 ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb6 ffffff9a ffffffe0 ffffffb7 ffffff98 ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb6 ffffffba ffffffe0 ffffffb7 ffffff9a 20 ffffffe0 ffffffb7 ffffff84 ffffffe0 ffffffb6 ffffffb8 ffffffe0 ffffffb7 ffffff94 20 ffffffe0 ffffffb7 ffffff80 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffba 20 ffffffe0 ffffffb7 ffffff84 ffffffe0 ffffffb7 ffffff90 ffffffe0 ffffffb6 ffffff9a ffffffe0 ffffffb7 ffffff92 20 ffffffe0 ffffffb6 ffffffba 2e 20 ffffffe0 ffffffb7 ffffff83 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffff82 ffffffe0 ffffffb7 ffffff84 ffffffe0 ffffffb6 ffffffbd ffffffe0 ffffffb6 ffffffba ffffffe0 ffffffb7 ffffff9a 20 ffffffe0 ffffffb6 ffffffb8 ffffffe0 ffffffb7 ffffff99 ffffffe0 ffffffb6 ffffffb8 20 ffffffe0 ffffffb6 ffffff8d 2c 20 ffffffe0 ffffffb6 ffffff8e 2c 20 ffffffe0 ffffffb6 ffffff8f 2c 20 ffffffe0 ffffffb6 ffffff90 20 ffffffe0 ffffffb6 ffffffba ffffffe0 ffffffb6 ffffffb1 20 ffffffe0 ffffffb6 ffffff85 ffffffe0 ffffffb6 ffffff9a ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb7 ffffff82 ffffffe0 ffffffb6 ffffffbb 20 ffffffe0 ffffffb7 ffffff83 ffffffe0 ffffffb7 ffffff84 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffad 20 ffffffe0 ffffffb7 ffffff81 ffffffe0 ffffffb6 ffffffb6 ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb6 ffffffaf 20 ffffffe0 ffffffb6 ffffff89 ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb7 ffffff8f ffffffe0 ffffffb6 ffffffb8 20 ffffffe0 ffffffb7 ffffff80 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffbb ffffffe0 ffffffb7 ffffff85 20 ffffffe0 ffffffb6 ffffffba 2e 20 ffffffe0 ffffffb6 ffffff92 20 ffffffe0 ffffffb6 ffffffb1 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb7 ffffff83 ffffffe0 ffffffb7 ffffff8f 20 ffffffe0 ffffffb6 ffffffaf ffffffe0 ffffffb7 ffffff9d 2c 20 ffffffe0 ffffffb6 ffffff8d 2c 20 ffffffe0 ffffffb6 ffffff8e 2c 20 ffffffe0 ffffffb6 ffffff8f 2c 20 ffffffe0 ffffffb6 ffffff90 Can't encode transcription: 'ශබ්ද සංස්කෘතයේ හමු විය හැකි ය. සිංහලයේ මෙම ඍ, ඎ, ඏ, ඐ යන අක්ෂර සහිත ශබ්ද ඉතාම විරළ ය. ඒ නිසා දෝ, ඍ, ඎ, ඏ, ඐ' in language '' Encoding of string failed! Failure bytes: ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb7 ffffff80 ffffffe0 ffffffb6 ffffffbb 2c 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb6 ffffffb1 2c 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb6 ffffffa2 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffb4 ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb7 ffffff94 ffffffe0 ffffffb7 ffffff80 2c 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb6 ffffffa7 20 ffffffe0 ffffffb7 ffffff80 ffffffe0 ffffffb7 ffffff90 ffffffe0 ffffffb6 ffffffb1 ffffffe0 ffffffb7 ffffff92 20 ffffffe0 ffffffb7 ffffff80 ffffffe0 ffffffb6 ffffffa0 ffffffe0 ffffffb6 ffffffb1 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb6 ffffffb1 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffaf ffffffe0 ffffffb7 ffffff8a 20 ffffffe0 ffffffb6 ffffffb6 ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb6 ffffffbd ffffffe0 ffffffb6 ffffffba ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffa7 ffffffe0 ffffffb6 ffffffb1 ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb6 ffffff9c ffffffe0 ffffffb7 ffffff9a 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb6 ffffffbb ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffba 20 ffffffe0 ffffffb6 ffffffb4 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb7 ffffff85 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffb6 ffffffe0 ffffffb6 ffffffb3 ffffffe0 ffffffb7 ffffff80 20 ffffffe0 ffffffb6 ffffff91 ffffffe0 ffffffb6 ffffffb1 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb7 ffffff85 ffffffe0 ffffffb6 ffffff9f 20 ffffffe0 ffffffb6 ffffff9a ffffffe0 ffffffb7 ffffff98 ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffba ffffffe0 ffffffb7 ffffff9a ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb7 ffffff8a 20 ffffffe0 ffffffb6 ffffff87 ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb7 ffffff94 ffffffe0 ffffffb7 ffffff85 ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb7 ffffff8a 20 ffffffe0 ffffffb7 ffffff80 ffffffe0 ffffffb6 ffffffb1 ffffffe0 ffffffb7 ffffff94 20 ffffffe0 ffffffb6 ffffff87 ffffffe0 ffffffb6 ffffffad 2e 20 ffffffe0 ffffffb6 ffffff92 ffffffe0 ffffffb6 ffffffaf ffffffe0 ffffffb6 ffffffab ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb6 ffffffa9 ffffffe0 ffffffb7 ffffff99 ffffffe0 ffffffb6 ffffffb1 ffffffe0 ffffffb7 ffffff8a Can't encode transcription: 'ඊසාන, ඊනියා, ඊශ්වර, ඊතන, ඊජිප්තුව, ඊට වැනි වචන ඊනිද් බ්ලයිටන්ගේ ඊරිය පිළිබඳව එන ඊළඟ කෘතියේත් ඇතුළත් වනු ඇත. ඒදණ්ඩෙන්' in language '' *It kept repeating for many sentences endlessly until the log file grows very big. Can somebody explain me what this issue is? In my command I was using newly created traineddata file when creating training data. At the beginning it outputs "*Warning: given outputs 111 not equal to unicharset of 90.*" which I think is the problem. If you need any more files from my data set for analysis please let me know. * For more info, *My tesseract version:* tesseract 4.0.0-beta.4-74-gd8237 leptonica-1.77.0 libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 Found SSE *My OS details,* shandigutt@shandigutt-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.1 LTS Release: 18.04 Codename: bionic Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/13045376-a205-4698-b7b5-dd6f3f6b1093%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

