Thanks for the tip. I'll start using that function. On Mar 17, 10:47 pm, nguyenq <[email protected]> wrote: > tessnet2 has a method that accepts as an argument a Rectangle object > that defines a region that you want to recognize. There's no need to > generate subimages. > > tessnet2.Tesseract.DoOCR(image, rect) > > On Mar 17, 10:47 am, dataintelligence <[email protected]> wrote: > > > I just realized that I have put this in the wrong group. I apologize > > for that. However, I started out thinking Ocropus was necessary, but > > ended up using tesseract. > > > I solved this problem by creating another program to cut out the > > regions I need to OCR and then called tesseract on each individual > > image through a piped system command. Very crude, but I don't have a > > lot of time. If anyone has any other ideas I would love to hear them. > > > On Mar 16, 3:23 pm, dataintelligence <[email protected]> wrote: > > > > I work for a real estate data modeling company, and we produce several > > > thousand images daily that we need OCR'd. As of today, I'm using a C++ > > > wrapper to run the ocropus binary and grab the output, and then I'm > > > just searching through those results looking for strings like bbox x1 > > > y1 x2 y2 to get my data. The problem with this is that if the scanned > > > in image is a few pixel off, then my program won't return the correct > > > values. What I need to do is give ocropus a certain bounding box and > > > then have it return the ocr'd content (if any) of that box. Is this > > > possible? Ideally, I would give ocropus 20-30 bounding boxes and get > > > back the data I need from the images. At the moment, I'm just allowing > > > for a ten pixel tolerance on the areas returned in the hOCR text, > > > which is working, but not as well as I would like.
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

