> I have images and manually corrected Text with line coordinates. From
those, I've generated .box files;

What method did you use for generating the .box files?

Please provide the image for the box file for test.

On Thu, Apr 18, 2019 at 6:09 PM <[email protected]> wrote:

> Dear reader,
> I want to improve devanagari recognition.
> I have images and manually corrected Text with line coordinates.
> From those, I've generated .box files;
> see attached file which produces the error above.
>
> Complete error Message from lstmtrain:
> »Encoding of string failed! Failure bytes: 9 32 37 38 ffffffe0 ffffffa4
> ffffff98 ffffffe0 ffffffa5 ffffff8d ffffffe0 ffffffa4 ffffffa8 ffffffe0
> ffffffa5 ffffff87 ffffffe0 ffffffa4 ffffffb6 ffffffe0 fffff...
> Can't encode transcription: 'श्रीगणेशायनमः ।। अलिकुलमण्डितगण्डं
> प्रत्यूहतिमिरमार्त्तण्डं सिन्दूरारुणशुण्डं देवंवेतण्डमुण्डमवलम्बे १ वि
> 278घ्नेश्वरायवरदायसुरप्रियाय लम्बोदरा...
>
> ...
> ...«
>
> .lstmf-Files are generated using »tesseract $tiff $box --tessdata-dir
> ~/tessdata_best -l script/Devanagari lstm.train«
>
> training is run by
> »combine_tessdata -u ~/tessdata_best/script/Devanagari.traineddata
> /tmp/Deva.trta
> mkdir /tmp/deva
> ls -1 *.lstmf >/tmp/list.txt
> lstmtraining --model_output /tmp/deva --continue_from /tmp/Deva.trta.lstm
> --traineddata ~/tessdata_best/script/Devanagari.traineddata
> --train_listfile /tmp/list.txt«
>
> I have double-checked that only characters from
> Devanagari.traineddata.lstm-unicharset are in the .box files.
> No tabs, no control characters.
>
> But the "9" from the error message above sounds like tab...?
>
> Any ideas?
>
> Kind regards, Jochen
>
> PS: latest tesseract 4.1.0-rc1; tessdata_best: commit 95593f0b017280...
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/3f411945-e3d5-4b70-bce6-b33e2aab7bfc%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/3f411945-e3d5-4b70-bce6-b33e2aab7bfc%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXEwQh0%2B6hZo58n4d9cig-7NkTWshi6u5RX4LJQgSspLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to