Finding the first block of text?

Kurt Marek Tue, 09 Jul 2013 21:03:32 -0700

I'm new to tesseract, so please excuse my naiveté. I'm trying to scan some 
newspaper headlines, but I don't need the text in the body of the articles. 
Obviously, the headline is a much larger type and a different font. Running 
tesseract in default page segmentation mode usually does a good job of 
recognizing the main body text, but a poor job on the headline. I'm 
thinking that if I can separate out the blocks for the headline, the body 
text, and any nearby images, that I could just perform the recognition on 
the headline and it might work better (and faster). I can always position 
the headline at the top left of the image, so it will be first in reading 
order. I've tried to read through the code and figure out how to only focus 
on the headline block, but I'm a little lost. Will GetComponentImages work? 
Am I barking up the wrong tree?



Any help would be appreciated.

Thanks!

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Finding the first block of text?

Reply via email to