Thanks, I will definitely use that. One more thing that document should have is a mention of Stroke Width Transform to improve OCR recognition on images that have a lot of non-text content.
Here's an example of SWT in action <https://github.com/tleyden/open-ocr/wiki/Stroke-Width-Transform-In-Action>. On Sun, Jun 22, 2014 at 11:07 AM, Robert Komar <[email protected]> wrote: > On Fri, 20 Jun 2014, Traun Leyden wrote: > > Thanks, this is really useful. (and shame on me for not >> RTFM'ing a bit more first) >> That document mentions to make sure the orientation/skew >> is straight, but does not give any hints on how to >> actually do this in an automated fashion. Any tips? >> > > You can use imagemagick's "convert" utility to deskew > images. For example: > > convert <skewed_image> -deskew 40 <deskewed_image> >> > > It works pretty well for text-only images. Embedded > images within the text tend to mess it up, though. > > Rob Komar > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/tesseract-ocr/tjY0jZsopwA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/alpine.LNX.2.02.1406221104320.7820%40robpc4.home.org. > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CACSSHCHARjKMDk25Wa0N%3DJN%2BXija1BhZ7L-eNQiXRmAQenyh%3Dw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

