I am starting to use Tesseract to ocr scanned documents. These documents comprise of forms, letters, and diagrams.
I have noticed that underlined text does not appear to be recognized. Reading through the posting it is unclear whether it is supported for all fonts (it may be limited to fixed fonts). If so, what is the best means by which to address underlines ? Reading "An Overview of the Tesseract OCR Engine" it states that "The line finding algorithm is designed so that a skewed page can be recognized without having to deskew, thus saving loss of image quality". However most of the posts identify deskew as a prerequisite preprocessing step. Could someone please elaborate on the use cases where it is needed. Thanks -- viraf -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ce7c99c2-7d35-42c4-b819-37c6bdf2ed71%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

