Re: Using Grammar to Improve Image Decoding Accuracy

Amrit Thu, 07 Apr 2011 02:28:26 -0700

Thanks Patrik,Dimitri,

@Patrik : Thanks for the heads up on the usage of grammar.I am looking
into the 'scanbizcard' app on my andriod and will share the updates
with you on my test images.


Overall, I would like to understand how the word recognition in
tesseract is happening if the grammar is not being used at all.I was
looking into the language_model
code but not able to figure out how exactly the word is formed from
the individual char decoding.Any inputs?

I am coming from a automatic speech recognition background so please
bear with me ,what I was looking for was a typical language model
implementation where I could force the output from tesseract.

an example of my current test case is an address as follows:
image ground truth : SOUTHBURY, CT 0688
tesseract output    : SOUTHBURY~ CT DLUBB

as you can see tesseract fails in the number decoding alone(mainly due
to the image font and spacing).Is there a way that I can force feed to
the tesseract to only look into a given list of viable results.Thus
totally avoiding such errors.i.e the decoding of SOUTHBURY is force
feed into a list like

SOUTHBURY CT 0688
XYZ CT 0689
...

resulting in the correct output.

*I was not able to attach the image here, will send it over email.

Any suggestions are much appreciated.

Regards,
Amrit.









On Apr 6, 12:05 am, Dmitri Silaev <[email protected]> wrote:
> Agree not to use dictionary at all. IMO the best you can do is:
> - use appropriate whitelists for each character position
> - obtain a set of char choices for every char position
> - restrict choice sets by using other semantic information you may have
>
> Warm regards,
> Dmitri Silaev
>
>
>
>
>
>
>
> On Wed, Apr 6, 2011 at 6:00 AM, Amrit <[email protected]> wrote:
> > Hi All,
> >        I am trying to evaluate tesseract to decode US postal address
> > from a set of images(english text with varying font).I want to extract
> > the city,state zipcode combination from the image.In doing so, out of
> > the box tesseract 3.01 performance is average and I would like to
> > increase the accuracy of the system by providing a custom grammar/
> > wordlist (language model).
> >       Any idea as to how to accomplish this?(My custom grammar/
> > language model will only contain City,State and ZipCode numbers).
>
> > I have tried to create custom dawg by following on the lines of
> > 'training tesseract 3' wiki page, but this doesn't seem to work at
> > all.Is there any way I can do this without training a subset of my
> > test images?
>
> > Regards,
> > Amrit.
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "tesseract-ocr" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/tesseract-ocr?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Using Grammar to Improve Image Decoding Accuracy

Reply via email to