We recognize US addresses (among other things) and we also tried using
the dictionary support quite unsuccessfully - the dictionary seems to
be taken into consideration one out of 20-50 times where it should
apply. Instead, what we do is use regular expressions to recognize the
expected grammar of US addresses with many alternative spellings with
mistakes known to occur, and then replace with the correct spelling.

You can get a sense of what it yield by running our app on your images
and see what you get - then ask me what grammar rule helped in this or
that case. You'll find links to install ScanBizCards (it's free) on
the iPhone or Android at www.scanbizcards.com

Patrick

On Apr 5, 10:00 pm, Amrit <[email protected]> wrote:
> Hi All,
>         I am trying to evaluate tesseract to decode US postal address
> from a set of images(english text with varying font).I want to extract
> the city,state zipcode combination from the image.In doing so, out of
> the box tesseract 3.01 performance is average and I would like to
> increase the accuracy of the system by providing a custom grammar/
> wordlist (language model).
>        Any idea as to how to accomplish this?(My custom grammar/
> language model will only contain City,State and ZipCode numbers).
>
> I have tried to create custom dawg by following on the lines of
> 'training tesseract 3' wiki page, but this doesn't seem to work at
> all.Is there any way I can do this without training a subset of my
> test images?
>
> Regards,
> Amrit.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to