Kannada chars are essentially is built up of more components and in many fonts these components do not touch and leave a gap. Another feature is inherent separation in chars like ಕೀ ಕೇ ಕೋ and in compound characters. Tesseract does not treat what is inside a box as one char but recognizes as more than one. And in such cases the result is distorted o/p. Developers may kindly modify codes in tesseract, so that it treats what is inside a box used for training any language file, as one char and should not be split in o/p, even if there is a gap for internal vertical scanning by tesseract
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

