Agree not to use dictionary at all. IMO the best you can do is:
- use appropriate whitelists for each character position
- obtain a set of char choices for every char position
- restrict choice sets by using other semantic information you may have

Warm regards,
Dmitri Silaev





On Wed, Apr 6, 2011 at 6:00 AM, Amrit <[email protected]> wrote:
> Hi All,
>        I am trying to evaluate tesseract to decode US postal address
> from a set of images(english text with varying font).I want to extract
> the city,state zipcode combination from the image.In doing so, out of
> the box tesseract 3.01 performance is average and I would like to
> increase the accuracy of the system by providing a custom grammar/
> wordlist (language model).
>       Any idea as to how to accomplish this?(My custom grammar/
> language model will only contain City,State and ZipCode numbers).
>
> I have tried to create custom dawg by following on the lines of
> 'training tesseract 3' wiki page, but this doesn't seem to work at
> all.Is there any way I can do this without training a subset of my
> test images?
>
> Regards,
> Amrit.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to