Take a look at TessBaseAPI::TesseractRect(). This is basically a
convinience method which wraps up the calls for you.
In the first step set the image you want to work on. All you need is a
pointer to your image data ,the dimensions of your image (width,
height),
the size of one pixel in bytes (which is 3 for the imag you uploaded)
and the number of bytes in one line of your image ( =
size_of_one_pixel * width_of_the_image).
In the second step you call SetRectangle() giving the coordinates of
the upper left corner of your ROI and the height and the width
of the ROI to the method ( you should check prior to that the ROI
dimensions do not exceed the dimensions of the source image).
The last step is to call GetUTF8Test() which returns your resultstring
as char pointer. You might rethink converting your images to grayscale
as well.
I got a good result on your image after I grayscaled it in Gimp and
saved it as BMP:
https://docs.google.com/leaf?id=0B2ifXewLRYsdMjAyNTAwZTctZDgyZi00NWM3LWFlZTYtYWJmYjEwZDZkMTA3&hl=de

8flm6


On 30 Jun., 18:00, "[email protected]" <[email protected]> wrote:
> This SetRectangle() method is intriguing.  Could you give me an
> example on how to implement it?  95% of the new meters are on the left
> half of the picture.
>
> Thanks!
>
> On Jun 29, 1:53 pm, 8flm6 <[email protected]> wrote:
>
>
>
>
>
>
>
> > Hello,
>
> > The Tesseract API provides a SetRectangle() method, to limit the
> > character recognition to a certain area.
> > If all of your images look nearly the same (new electric meter on the
> > lower left side and the old on the right),
> > you could define a static region of interest which generously covers
> > the number you'd like to read on every image.
> > If every image looks different, you will likely need a more elaborate
> > algorithm which finds the ROIs first,
> > and then passes the Coordinates to Tesseract. Then in the end you
> > could apply a regular expression to your reading
> > results to filter the number you're searching for, something like '/
> > [0-9]{2} [0-9]{3} [0-9]{3}/' if the number has always the
> > format like the one in the picture you uploaded. Hope you'll find a
> > solution!
>
> > 8flm6
>
> > On 29 Jun., 13:32, "[email protected]" <[email protected]> wrote:
>
> > > Update: on a batch of 60 meters, I was able to get 46 meters
> > > recognized.
>
> > > First i ran a batch that runs tesseract on every .tif, and names the
> > > output <picture name>.txt.
> > > Then, I simply wrote a batch script to compare a text file of known
> > > meter numbers against every tesseract output file using findstr.
> > > The results show up as <picture name>.tif:<picture name>.txt.
>
> > > Is there any way to optimize the pictures to make the text easier to
> > > read before processing?  I tried converting to grayscale last night,
> > > but it actually hurt the results.  The meters that don't come across
> > > all seem to have minimal glare problems.
>
> > > At any rate, in the trials, I have already saved myself a ton of time,
> > > and for that I am happy.  Where's the donate button?
> > > On Jun 28, 1:30 pm, "[email protected]" <[email protected]> wrote:
>
> > > > Scenario:  We have 7000+ electric meters being changed out, and while
> > > > changing them out we are taking a picture of the new meter beside the
> > > > old meter to capture the previous reading.  We are looking for a way
> > > > to extract the meter number from all 7000 pictures programmatically.
> > > > I have gotten as far as creating a batch script to run tesseract for
> > > > all files in a folder, and create output txt files for all of the
> > > > images.  Within these images I see a bunch of jarbled text, and
> > > > eventually I find the meter number.  My question, can I extract just
> > > > that meter number out of the images programmatically?  I have a list
> > > > of all 7000 meter numbers, and considered maybe making a dictionary
> > > > file of just these.  Would that possibly work?  Can tesseract be set
> > > > to ignore anything that isn't a dictionary match?
>
> > > > Sample meter file:http://deangrell.com/CIMG0005.tif
>
> > > > The meter number we are trying to read is on the left,76 207 799.
> > > > Everything pulls across, even the "SANAGAMO" on the bottom of the
> > > > right meter.  This software is truly impressive, I just need to find a
> > > > way to focus it on the meter numbers.
>
> > > > Any help at all would be appreciated!

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to