Hi,
I'm trying to extract fields from a document and specifically I'm
looking to detect all horizontal lines. I've considered using a raw
tool like a hough transform, but I think that a proper page layout
analysis should be able to discern between a horizontal line and a
table for instance.

Can I do this with tesseract or the surrounding community of tools?
Where would you start?

For instance, in this document, all of the horizontal lines have been
detected and you can see a green bounding box around them with. I'm
using a commercial tool in this instance, but I'd rather use tesseract
if I can.

http://ec2-50-17-56-157.compute-1.amazonaws.com/contracts/auto-sale-contract-bound.png

Thanks in advance,
Patrick.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to