Hello,
The Tesseract API provides a SetRectangle() method, to limit the
character recognition to a certain area.
If all of your images look nearly the same (new electric meter on the
lower left side and the old on the right),
you could define a static region of interest which generously covers
the number you'd like to read on every image.
If every image looks different, you will likely need a more elaborate
algorithm which finds the ROIs first,
and then passes the Coordinates to Tesseract. Then in the end you
could apply a regular expression to your reading
results to filter the number you're searching for, something like '/
[0-9]{2} [0-9]{3} [0-9]{3}/' if the number has always the
format like the one in the picture you uploaded. Hope you'll find a
solution!
8flm6
On 29 Jun., 13:32, "[email protected]" <[email protected]> wrote:
> Update: on a batch of 60 meters, I was able to get 46 meters
> recognized.
>
> First i ran a batch that runs tesseract on every .tif, and names the
> output <picture name>.txt.
> Then, I simply wrote a batch script to compare a text file of known
> meter numbers against every tesseract output file using findstr.
> The results show up as <picture name>.tif:<picture name>.txt.
>
> Is there any way to optimize the pictures to make the text easier to
> read before processing? I tried converting to grayscale last night,
> but it actually hurt the results. The meters that don't come across
> all seem to have minimal glare problems.
>
> At any rate, in the trials, I have already saved myself a ton of time,
> and for that I am happy. Where's the donate button?
> On Jun 28, 1:30 pm, "[email protected]" <[email protected]> wrote:
>
>
>
>
>
>
>
> > Scenario: We have 7000+ electric meters being changed out, and while
> > changing them out we are taking a picture of the new meter beside the
> > old meter to capture the previous reading. We are looking for a way
> > to extract the meter number from all 7000 pictures programmatically.
> > I have gotten as far as creating a batch script to run tesseract for
> > all files in a folder, and create output txt files for all of the
> > images. Within these images I see a bunch of jarbled text, and
> > eventually I find the meter number. My question, can I extract just
> > that meter number out of the images programmatically? I have a list
> > of all 7000 meter numbers, and considered maybe making a dictionary
> > file of just these. Would that possibly work? Can tesseract be set
> > to ignore anything that isn't a dictionary match?
>
> > Sample meter file:http://deangrell.com/CIMG0005.tif
>
> > The meter number we are trying to read is on the left,76 207 799.
> > Everything pulls across, even the "SANAGAMO" on the bottom of the
> > right meter. This software is truly impressive, I just need to find a
> > way to focus it on the meter numbers.
>
> > Any help at all would be appreciated!
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en