I have a page image that's mostly an illustration, with a line of text
above and a three-line italicized caption below the (rectangular)
illustration.  It has about a 1degree skew clockwise (nothing unusual
-- i didn't think)

Well, the top sentence gets recognized, the illustration is skipped,
but then the caption below the illustration is ALSO skipped.

However:  When I deskew the tiff with "convert -deskew 40%",
everything gets recognized (with some "routine" glyph-level
misrecognition) -- tesseract does attempt to recognize the italicized
caption below.

What does one make of this?  Do we have to deskew the image outside of
tesseract, before running the latter?   How much tolerance is there
for skewing (in terms of degrees) ?

Among other things this post is a "heads-up!" because the failure is
sort of silent -- partially missing text may not be missed (by humans)
until late in the project or even after the "end product" is in use.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to