Re: [tesseract-ocr] Re: Tesseract error while combine_lang_model

Shree Devi Kumar Wed, 08 Apr 2020 20:33:25 -0700

devenagari.unicharset, Latin.unicharset and radical-stroke.txt

The script unicharset are useful in setting character properties. For most
scripts they are already available in langadata_lstm. I don't  think they
are mandatory for lstm training but by copying them once you can avoid the
warning messages.


radical-stroke.txt is used only for CJK languages, but tesseract checks for
it during training process, so you need to make it available.

For chattisgarhi, if training for as written in Devanagari, I will suggest
training from script/Devanagari.traineddata rather than English.

Please note if you are starting from scratch, then you don't need a
starting traineddata. If you use one, then you are finetuning.

Finally,  you need to use the correct mode for Indic language with
unicharset_extractor. Your unicharset should have Unicode codepoints, not
akshara (consanant vowel sign combination).

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUihe91fFpd%3DJX5SF6rQvW60j3SjnqO11DMqorxfsRA5A%40mail.gmail.com.

Re: [tesseract-ocr] Re: Tesseract error while combine_lang_model

Reply via email to