In language file spr_latn.tessdata (Serbian lating) there is a line 
tessedit_load_sublangs srp
which means that tesseract loads srp (Serbian Cyrillic) language file.

As a result some of the text is recognized as cyrillic, even if the 
original text contains no cyrillic script at all!

Can this option be disabled in any way, or new language files provided 
without the "load sublangs" part?

(Older version of this language file did not have this line.)

Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f0b43596-ac01-47a5-bf1b-27cd0cf12b76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to