> I have images and manually corrected Text with line coordinates. From those, I've generated .box files;
What method did you use for generating the .box files? Please provide the image for the box file for test. On Thu, Apr 18, 2019 at 6:09 PM <[email protected]> wrote: > Dear reader, > I want to improve devanagari recognition. > I have images and manually corrected Text with line coordinates. > From those, I've generated .box files; > see attached file which produces the error above. > > Complete error Message from lstmtrain: > »Encoding of string failed! Failure bytes: 9 32 37 38 ffffffe0 ffffffa4 > ffffff98 ffffffe0 ffffffa5 ffffff8d ffffffe0 ffffffa4 ffffffa8 ffffffe0 > ffffffa5 ffffff87 ffffffe0 ffffffa4 ffffffb6 ffffffe0 fffff... > Can't encode transcription: 'श्रीगणेशायनमः ।। अलिकुलमण्डितगण्डं > प्रत्यूहतिमिरमार्त्तण्डं सिन्दूरारुणशुण्डं देवंवेतण्डमुण्डमवलम्बे १ वि > 278घ्नेश्वरायवरदायसुरप्रियाय लम्बोदरा... > > ... > ...« > > .lstmf-Files are generated using »tesseract $tiff $box --tessdata-dir > ~/tessdata_best -l script/Devanagari lstm.train« > > training is run by > »combine_tessdata -u ~/tessdata_best/script/Devanagari.traineddata > /tmp/Deva.trta > mkdir /tmp/deva > ls -1 *.lstmf >/tmp/list.txt > lstmtraining --model_output /tmp/deva --continue_from /tmp/Deva.trta.lstm > --traineddata ~/tessdata_best/script/Devanagari.traineddata > --train_listfile /tmp/list.txt« > > I have double-checked that only characters from > Devanagari.traineddata.lstm-unicharset are in the .box files. > No tabs, no control characters. > > But the "9" from the error message above sounds like tab...? > > Any ideas? > > Kind regards, Jochen > > PS: latest tesseract 4.1.0-rc1; tessdata_best: commit 95593f0b017280... > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/3f411945-e3d5-4b70-bce6-b33e2aab7bfc%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/3f411945-e3d5-4b70-bce6-b33e2aab7bfc%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXEwQh0%2B6hZo58n4d9cig-7NkTWshi6u5RX4LJQgSspLA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

