I just realized that I have put this in the wrong group. I apologize
for that. However, I started out thinking Ocropus was necessary, but
ended up using tesseract.

I solved this problem by creating another program to cut out the
regions I need to OCR and then called tesseract on each individual
image through a piped system command. Very crude, but I don't have a
lot of time. If anyone has any other ideas I would love to hear them.

On Mar 16, 3:23 pm, dataintelligence <[email protected]> wrote:
> I work for a real estate data modeling company, and we produce several
> thousand images daily that we need OCR'd. As of today, I'm using a C++
> wrapper to run the ocropus binary and grab the output, and then I'm
> just searching through those results looking for strings like bbox x1
> y1 x2 y2 to get my data. The problem with this is that if the scanned
> in image is a few pixel off, then my program won't return the correct
> values. What I need to do is give ocropus a certain bounding box and
> then have it return the ocr'd content (if any) of that box. Is this
> possible? Ideally, I would give ocropus 20-30 bounding boxes and get
> back the data I need from the images. At the moment, I'm just allowing
> for a ten pixel tolerance on the areas returned in the hOCR text,
> which is working, but not as well as I would like.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to