> Warning: given outputs 111 not equal to unicharset of 90.

your starter traineddata has a unicharset of 90.
In your --net_spec you have specified number of unichars as 111.

> Encoding of string failed!

It means that some of the chracters in the displayed string are NOT in the
unicharset of your starter traineddata.

The errors seem to be in the lines from your eval set. Looks like there are
some characters in that which are not in your training data.

It is also possible that these lines don't meet the Sinhala normaliation
rules.

On Sat, Sep 8, 2018 at 11:59 PM, Shandigutt <strider...@gmail.com> wrote:

> Hi,
>
> *I was trying to run lstmtraining script using below command,*
>
> ./build/src/training/lstmtraining --debug_interval 100 \
>   --traineddata ../training/sintrain/sin/sin.traineddata \
>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
>   --model_output /media/shandigutt/UUI/training/base --learning_rate
> 20e-4 \
>   --train_listfile ../training/sintrain/sin.training_files.txt \
>   --eval_listfile ../training/sineval/sin.training_files.txt \
>   --max_iterations 5000 &> /media/shandigutt/UUI/training/basetrain.log
>
> *I got the following output,*
>
> Warning: given outputs 111 not equal to unicharset of 90.
> Num outputs,weights in Series:
>   1,36,0,1:1, 0
> Num outputs,weights in Series:
>   C3,3:9, 0
>   Ft16:16, 160
> Total weights = 160
>   [C3,3Ft16]:16, 160
>   Mp3,3:16, 0
>   Lfys48:48, 12480
>   Lfx96:96, 55680
>   Lrx96:96, 74112
>   Lfx256:256, 361472
>   Fc90:90, 23130
> Total weights = 527034
> Built network:[1,36,0,1[C3,3Ft16]Mp3,3Lfys48Lfx96Lrx96Lfx256Fc90] from
> request [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]
> Training parameters:
>   Debug interval = 100, weights = 0.1, learning rate = 0.002, momentum=0.5
> null char=2
> Loaded 106/106 pages (1-106) of document ../training/sintrain/sin.
> BhashitaComplex.exp0.lstmf
> Loaded 106/106 pages (1-106) of document ../training/sineval/sin.
> BhashitaComplex.exp0.lstmf
> Encoding of string failed! Failure bytes: ffffffe0 ffffffb7 ffffff8a
> ffffffe0 ffffffb6 ffffffaf 20 ffffffe0 ffffffb7 ffffff83 ffffffe0 ffffffb6
> ffffff82 ffffffe0 ffffffb7 ffffff83 ffffffe0 ffffffb7 ffffff8a ffffffe0
> ffffffb6 ffffff9a ffffffe0 ffffffb7 ffffff98 ffffffe0 ffffffb6 ffffffad
> ffffffe0 ffffffb6 ffffffba ffffffe0 ffffffb7 ffffff9a 20 ffffffe0 ffffffb7
> ffffff84 ffffffe0 ffffffb6 ffffffb8 ffffffe0 ffffffb7 ffffff94 20 ffffffe0
> ffffffb7 ffffff80 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffba 20
> ffffffe0 ffffffb7 ffffff84 ffffffe0 ffffffb7 ffffff90 ffffffe0 ffffffb6
> ffffff9a ffffffe0 ffffffb7 ffffff92 20 ffffffe0 ffffffb6 ffffffba 2e 20
> ffffffe0 ffffffb7 ffffff83 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6
> ffffff82 ffffffe0 ffffffb7 ffffff84 ffffffe0 ffffffb6 ffffffbd ffffffe0
> ffffffb6 ffffffba ffffffe0 ffffffb7 ffffff9a 20 ffffffe0 ffffffb6 ffffffb8
> ffffffe0 ffffffb7 ffffff99 ffffffe0 ffffffb6 ffffffb8 20 ffffffe0 ffffffb6
> ffffff8d 2c 20 ffffffe0 ffffffb6 ffffff8e 2c 20 ffffffe0 ffffffb6 ffffff8f
> 2c 20 ffffffe0 ffffffb6 ffffff90 20 ffffffe0 ffffffb6 ffffffba ffffffe0
> ffffffb6 ffffffb1 20 ffffffe0 ffffffb6 ffffff85 ffffffe0 ffffffb6 ffffff9a
> ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb7 ffffff82 ffffffe0 ffffffb6
> ffffffbb 20 ffffffe0 ffffffb7 ffffff83 ffffffe0 ffffffb7 ffffff84 ffffffe0
> ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffad 20 ffffffe0 ffffffb7 ffffff81
> ffffffe0 ffffffb6 ffffffb6 ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb6
> ffffffaf 20 ffffffe0 ffffffb6 ffffff89 ffffffe0 ffffffb6 ffffffad ffffffe0
> ffffffb7 ffffff8f ffffffe0 ffffffb6 ffffffb8 20 ffffffe0 ffffffb7 ffffff80
> ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffbb ffffffe0 ffffffb7
> ffffff85 20 ffffffe0 ffffffb6 ffffffba 2e 20 ffffffe0 ffffffb6 ffffff92 20
> ffffffe0 ffffffb6 ffffffb1 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb7
> ffffff83 ffffffe0 ffffffb7 ffffff8f 20 ffffffe0 ffffffb6 ffffffaf ffffffe0
> ffffffb7 ffffff9d 2c 20 ffffffe0 ffffffb6 ffffff8d 2c 20 ffffffe0 ffffffb6
> ffffff8e 2c 20 ffffffe0 ffffffb6 ffffff8f 2c 20 ffffffe0 ffffffb6 ffffff90
> Can't encode transcription: 'ශබ්ද සංස්කෘතයේ හමු විය හැකි ය. සිංහලයේ මෙම ඍ,
> ඎ, ඏ, ඐ යන අක්ෂර සහිත ශබ්ද ඉතාම විරළ ය. ඒ නිසා දෝ, ඍ, ඎ, ඏ, ඐ' in language
> ''
> Encoding of string failed! Failure bytes: ffffffe0 ffffffb7 ffffff8a
> ffffffe0 ffffffb7 ffffff80 ffffffe0 ffffffb6 ffffffbb 2c 20 ffffffe0
> ffffffb6 ffffff8a ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb6 ffffffb1 2c
> 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb6 ffffffa2 ffffffe0 ffffffb7
> ffffff92 ffffffe0 ffffffb6 ffffffb4 ffffffe0 ffffffb7 ffffff8a ffffffe0
> ffffffb6 ffffffad ffffffe0 ffffffb7 ffffff94 ffffffe0 ffffffb7 ffffff80 2c
> 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb6 ffffffa7 20 ffffffe0
> ffffffb7 ffffff80 ffffffe0 ffffffb7 ffffff90 ffffffe0 ffffffb6 ffffffb1
> ffffffe0 ffffffb7 ffffff92 20 ffffffe0 ffffffb7 ffffff80 ffffffe0 ffffffb6
> ffffffa0 ffffffe0 ffffffb6 ffffffb1 20 ffffffe0 ffffffb6 ffffff8a ffffffe0
> ffffffb6 ffffffb1 ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffaf
> ffffffe0 ffffffb7 ffffff8a 20 ffffffe0 ffffffb6 ffffffb6 ffffffe0 ffffffb7
> ffffff8a ffffffe0 ffffffb6 ffffffbd ffffffe0 ffffffb6 ffffffba ffffffe0
> ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffa7 ffffffe0 ffffffb6 ffffffb1
> ffffffe0 ffffffb7 ffffff8a ffffffe0 ffffffb6 ffffff9c ffffffe0 ffffffb7
> ffffff9a 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb6 ffffffbb ffffffe0
> ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffba 20 ffffffe0 ffffffb6 ffffffb4
> ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb7 ffffff85 ffffffe0 ffffffb7
> ffffff92 ffffffe0 ffffffb6 ffffffb6 ffffffe0 ffffffb6 ffffffb3 ffffffe0
> ffffffb7 ffffff80 20 ffffffe0 ffffffb6 ffffff91 ffffffe0 ffffffb6 ffffffb1
> 20 ffffffe0 ffffffb6 ffffff8a ffffffe0 ffffffb7 ffffff85 ffffffe0 ffffffb6
> ffffff9f 20 ffffffe0 ffffffb6 ffffff9a ffffffe0 ffffffb7 ffffff98 ffffffe0
> ffffffb6 ffffffad ffffffe0 ffffffb7 ffffff92 ffffffe0 ffffffb6 ffffffba
> ffffffe0 ffffffb7 ffffff9a ffffffe0 ffffffb6 ffffffad ffffffe0 ffffffb7
> ffffff8a 20 ffffffe0 ffffffb6 ffffff87 ffffffe0 ffffffb6 ffffffad ffffffe0
> ffffffb7 ffffff94 ffffffe0 ffffffb7 ffffff85 ffffffe0 ffffffb6 ffffffad
> ffffffe0 ffffffb7 ffffff8a 20 ffffffe0 ffffffb7 ffffff80 ffffffe0 ffffffb6
> ffffffb1 ffffffe0 ffffffb7 ffffff94 20 ffffffe0 ffffffb6 ffffff87 ffffffe0
> ffffffb6 ffffffad 2e 20 ffffffe0 ffffffb6 ffffff92 ffffffe0 ffffffb6
> ffffffaf ffffffe0 ffffffb6 ffffffab ffffffe0 ffffffb7 ffffff8a ffffffe0
> ffffffb6 ffffffa9 ffffffe0 ffffffb7 ffffff99 ffffffe0 ffffffb6 ffffffb1
> ffffffe0 ffffffb7 ffffff8a
> Can't encode transcription: 'ඊසාන, ඊනියා, ඊශ්වර, ඊතන, ඊජිප්තුව, ඊට වැනි
> වචන ඊනිද් බ්ලයිටන්ගේ ඊරිය පිළිබඳව එන ඊළඟ කෘතියේත් ඇතුළත් වනු ඇත. ඒදණ්ඩෙන්'
> in language ''
>
> *It kept repeating for many sentences endlessly until the log file grows
> very big. Can somebody explain me what this issue is? In my command I was
> using newly created traineddata file when creating training data. At the
> beginning it outputs "*Warning: given outputs 111 not equal to unicharset
> of 90.*"  which I think is the problem. If you need any more files from
> my data set for analysis please let me know. *
>
> For more info,
> *My tesseract  version:*
> tesseract 4.0.0-beta.4-74-gd8237
>  leptonica-1.77.0
>   libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib
> 1.2.11
>  Found SSE
>
> *My OS details,*
> shandigutt@shandigutt-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ lsb_release
> -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description: Ubuntu 18.04.1 LTS
> Release: 18.04
> Codename: bionic
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/13045376-a205-4698-b7b5-dd6f3f6b1093%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/13045376-a205-4698-b7b5-dd6f3f6b1093%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVjqBU4QU93p%2Br-wmyfhwWz%2Bie%2BmdoiCYMp4U6hECBaUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to