Hi, I'm trying to extract fields from a document and specifically I'm looking to detect all horizontal lines. I've considered using a raw tool like a hough transform, but I think that a proper page layout analysis should be able to discern between a horizontal line and a table for instance.
Can I do this with tesseract or the surrounding community of tools? Where would you start? For instance, in this document, all of the horizontal lines have been detected and you can see a green bounding box around them with. I'm using a commercial tool in this instance, but I'd rather use tesseract if I can. http://ec2-50-17-56-157.compute-1.amazonaws.com/contracts/auto-sale-contract-bound.png Thanks in advance, Patrick. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

