[tesseract-ocr] Handling underlines / skew

viraf Tue, 09 Feb 2016 15:57:10 -0800

I am starting to use Tesseract to ocr scanned documents.  These documents 
comprise of forms, letters, and diagrams.


I have noticed that underlined text does not appear to be recognized.  
Reading through the posting it is unclear whether it is supported for all 
fonts (it may be limited to fixed fonts).  If so, what is the best means by 
which to address underlines ?

Reading "An Overview of the Tesseract OCR Engine" it states that "The line 
finding algorithm is designed so that a skewed page can be recognized 
without having to deskew, thus saving loss of image quality".  However most 
of the posts identify deskew as a prerequisite preprocessing step.  Could 
someone please elaborate on the use cases where it is needed.

Thanks

-- viraf

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ce7c99c2-7d35-42c4-b819-37c6bdf2ed71%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Handling underlines / skew

Reply via email to