It was an answer with general thoughts. You've shown images that just can
be found on the internet. To suggest a more detailed processing pipeline I
need real sample images and probably ask more questions. Depending on that,
you can start with binarization and CC labeling, or you can jump right to
region cropping.

Tons of good resources are out there. Also dependent on what you really
need.

For binarization and CC labeling I'd suggest (risking to be criticized by
others):
- First, you need to read some classics.
"Digital Image Processing" - Gonzalez, Woods. Sections "Thresholding" and
"Extraction of Connected Components" and adjacent sections.
- Second, a tool to quickly get down to trying recipes. OpenCV.
http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#threshold
and
http://docs.opencv.org/3.0-beta/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html

Most of layout analysis (and document image analysis in general) related
methods are published in the form of scientific papers. These might be
outdated but sufficient to begin your travel through papers:
- "Geometric Layout Analysis Techniques for Document Image Understanding: a
Review" - 1998 - Cattoni, Coianiz
- "Document Structure Analysis Algorithms" - 2003 - Mao, Rosenfeld, Kanungo

Best regards,
Dmitri Silaev
www.CustomOCR.com





On Mon, Jun 1, 2015 at 7:43 PM, S Kirkwood <[email protected]> wrote:

> Thank you for the response Dmitri.
>
> It is reassuring to know that this can be done.  From your description it
> seems as though the first step would be to use some blob detection method
> to find the different regions within a picture.   Then, run Tess on the
> regions that I have found, which should give me a better result than
> running it over the entire image.  However, I am uncertain of where
> to proceed from here, as I am not well versed in this subject area.  Do you
> know of any good resources I could use in order to learn more about the
> methods that I would need to use?
>
> Thanks,
> Scott
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/65a85130-eec3-484f-8c9d-625341da3597%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/65a85130-eec3-484f-8c9d-625341da3597%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFOh9oCJo5ekJ%3DqQ4sr-oRte30qMUmD%3D-M6wcQond41fhQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to