OCRopus provides page segmentation algorithms that do just this. The C++ and Python interface is called ISegmentPage. The command line tool is called ocropus-pseg (you probably need to call ocropus- binarize first). It outputs a color image that assigns a different color to each region. There are multiple algorithms implementing page segmentation; not all algorithms work for all page types.
Tom On May 12, 8:39 am, dialer <[email protected]> wrote: > I am new to this. I want to be able to perform this, and I wonder if > there is any APIs or useful utils which I can use to accomplish > this :- > > I want to perform what I call it as 'region analysis' ( not sure if it > is the correct terminology ) on images, basically given any image, > there maybe a few regions where there will be characters on it, a > region is basically an arbitrarily sized rectangular area, each region > is made up of a cluster of words, and one region is separated from > another region by white space. > > Basically I want to split the images into regions, and then perform > OCR on each region, as the characters found in each region are related > information and thus I would like to store the information found in > each region separately. > > Is there a way I could accomplish this ? > > Thank you very much for your reading. > > -- > You received this message because you are subscribed to the Google Groups > "ocropus" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group > athttp://groups.google.com/group/ocropus?hl=en. -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
