I'm new to tesseract, so please excuse my naiveté. I'm trying to scan some newspaper headlines, but I don't need the text in the body of the articles. Obviously, the headline is a much larger type and a different font. Running tesseract in default page segmentation mode usually does a good job of recognizing the main body text, but a poor job on the headline. I'm thinking that if I can separate out the blocks for the headline, the body text, and any nearby images, that I could just perform the recognition on the headline and it might work better (and faster). I can always position the headline at the top left of the image, so it will be first in reading order. I've tried to read through the code and figure out how to only focus on the headline block, but I'm a little lost. Will GetComponentImages work? Am I barking up the wrong tree?
Any help would be appreciated. Thanks! -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

