I work for a real estate data modeling company, and we produce several
thousand images daily that we need OCR'd. As of today, I'm using a C++
wrapper to run the ocropus binary and grab the output, and then I'm
just searching through those results looking for strings like bbox x1
y1 x2 y2 to get my data. The problem with this is that if the scanned
in image is a few pixel off, then my program won't return the correct
values. What I need to do is give ocropus a certain bounding box and
then have it return the ocr'd content (if any) of that box. Is this
possible? Ideally, I would give ocropus 20-30 bounding boxes and get
back the data I need from the images. At the moment, I'm just allowing
for a ten pixel tolerance on the areas returned in the hOCR text,
which is working, but not as well as I would like.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to