Hi Håkan,
You will probably need to preprocess the image to make sure it is as
clear as possible and the right resolution. The text needs to be
within a certain size range to be recognized.** Basically it goes like
this:

1) Pre-process image (with ImageMagick, for instance)
2) OCR with tesseract
3) Check text output (perhaps with Perl or Python regular expressions)

You may need to use

According to Dmitri Silaev, "For Latin-, Greek- and Cyrillic- based
alphabets characters having height of 24-72 pixels usually get
recognized decently. For character heights falling out of this range
you may need experimentation." Typically people get good results with
200-300 dpi images. We're mostly using it for printed pages, although
water metres and other random things have been photographed and OCR'ed
effectively.
Onnea matkaan! :-)
--Sven

On Tue, Aug 14, 2012 at 8:51 AM, Håkan M <[email protected]> wrote:
> Hi,
>
> I've been trying to find some software for a research project to extract
> numbers out of pictures and have been pointed to tesseract.
>
> What I would like to be able to do is conceptually rather simple:
> - Analyse an image taken with a mobile phone camera whether or not there are
> one or more numbers present in the picture anywhere and what those numbers
> are.
> - Before the analysis I do not know whether there are any numbers at all nor
> where in the image they are. The number to images to analyze is so big that
> a human interface to look at images and point to the numbers or similar is
> not feasible.
> - Feed the image to some piece of software
> - Get a simple list of all numbers that were found
>
> Is this doable with tesseract?
> Even better, is there something existing that I could use without any
> programming?
>
> Thanks for any suggestions/pointers, Håkan
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to