Is it possible for you to get images in higher res? For Tesseract this resolution might be insufficient to achieve decent accuracy.
You do need to train for this specific font, as the "default" Tesseract's eng font is just a collection of some famous computer fonts, and yours is not one of them. For town/city names indeed you can use dictionary approach, but for the state and zip I'd better use the one I described above. So the whole thing will require some programming, but as I can suppose, currently you just evaluate the executable. Warm regards, Dmitri Silaev On Thu, Apr 7, 2011 at 8:29 AM, Amrit <[email protected]> wrote: > Thanks,Sending it again. > On Wed, Apr 6, 2011 at 11:24 PM, Dmitri Silaev <[email protected]> > wrote: >> >> To let you know, >> can't see images yet... >> >> >> >> On Thu, Apr 7, 2011 at 8:17 AM, Amrit <[email protected]> wrote: >> > Hi Dmitri/Partik, >> > Thanks for your reply.I am sending along the pre processed test image >> > which >> > I had mentioned in my response. >> > tesseract output - SOUTHBURY~ CT DLUBB >> > >> > Regards, >> > Amrit. >> > >> > On Wed, Apr 6, 2011 at 12:05 AM, Dmitri Silaev <[email protected]> >> > wrote: >> >> >> >> Agree not to use dictionary at all. IMO the best you can do is: >> >> - use appropriate whitelists for each character position >> >> - obtain a set of char choices for every char position >> >> - restrict choice sets by using other semantic information you may have >> >> >> >> Warm regards, >> >> Dmitri Silaev >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Apr 6, 2011 at 6:00 AM, Amrit <[email protected]> >> >> wrote: >> >> > Hi All, >> >> > I am trying to evaluate tesseract to decode US postal address >> >> > from a set of images(english text with varying font).I want to >> >> > extract >> >> > the city,state zipcode combination from the image.In doing so, out of >> >> > the box tesseract 3.01 performance is average and I would like to >> >> > increase the accuracy of the system by providing a custom grammar/ >> >> > wordlist (language model). >> >> > Any idea as to how to accomplish this?(My custom grammar/ >> >> > language model will only contain City,State and ZipCode numbers). >> >> > >> >> > I have tried to create custom dawg by following on the lines of >> >> > 'training tesseract 3' wiki page, but this doesn't seem to work at >> >> > all.Is there any way I can do this without training a subset of my >> >> > test images? >> >> > >> >> > Regards, >> >> > Amrit. >> >> > >> >> > -- >> >> > You received this message because you are subscribed to the Google >> >> > Groups "tesseract-ocr" group. >> >> > To post to this group, send email to [email protected]. >> >> > To unsubscribe from this group, send email to >> >> > [email protected]. >> >> > For more options, visit this group at >> >> > http://groups.google.com/group/tesseract-ocr?hl=en. >> >> > >> >> > >> > >> > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

