It was an answer with general thoughts. You've shown images that just can be found on the internet. To suggest a more detailed processing pipeline I need real sample images and probably ask more questions. Depending on that, you can start with binarization and CC labeling, or you can jump right to region cropping.
Tons of good resources are out there. Also dependent on what you really need. For binarization and CC labeling I'd suggest (risking to be criticized by others): - First, you need to read some classics. "Digital Image Processing" - Gonzalez, Woods. Sections "Thresholding" and "Extraction of Connected Components" and adjacent sections. - Second, a tool to quickly get down to trying recipes. OpenCV. http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#threshold and http://docs.opencv.org/3.0-beta/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html Most of layout analysis (and document image analysis in general) related methods are published in the form of scientific papers. These might be outdated but sufficient to begin your travel through papers: - "Geometric Layout Analysis Techniques for Document Image Understanding: a Review" - 1998 - Cattoni, Coianiz - "Document Structure Analysis Algorithms" - 2003 - Mao, Rosenfeld, Kanungo Best regards, Dmitri Silaev www.CustomOCR.com On Mon, Jun 1, 2015 at 7:43 PM, S Kirkwood <[email protected]> wrote: > Thank you for the response Dmitri. > > It is reassuring to know that this can be done. From your description it > seems as though the first step would be to use some blob detection method > to find the different regions within a picture. Then, run Tess on the > regions that I have found, which should give me a better result than > running it over the entire image. However, I am uncertain of where > to proceed from here, as I am not well versed in this subject area. Do you > know of any good resources I could use in order to learn more about the > methods that I would need to use? > > Thanks, > Scott > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/65a85130-eec3-484f-8c9d-625341da3597%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/65a85130-eec3-484f-8c9d-625341da3597%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFOh9oCJo5ekJ%3DqQ4sr-oRte30qMUmD%3D-M6wcQond41fhQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

