Agree not to use dictionary at all. IMO the best you can do is: - use appropriate whitelists for each character position - obtain a set of char choices for every char position - restrict choice sets by using other semantic information you may have
Warm regards, Dmitri Silaev On Wed, Apr 6, 2011 at 6:00 AM, Amrit <[email protected]> wrote: > Hi All, > I am trying to evaluate tesseract to decode US postal address > from a set of images(english text with varying font).I want to extract > the city,state zipcode combination from the image.In doing so, out of > the box tesseract 3.01 performance is average and I would like to > increase the accuracy of the system by providing a custom grammar/ > wordlist (language model). > Any idea as to how to accomplish this?(My custom grammar/ > language model will only contain City,State and ZipCode numbers). > > I have tried to create custom dawg by following on the lines of > 'training tesseract 3' wiki page, but this doesn't seem to work at > all.Is there any way I can do this without training a subset of my > test images? > > Regards, > Amrit. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

