Hi Nick,
I have read that post earlier and also tried to preprocess the image. This
is the input image http://imgur.com/yCxOvQS,GD38rCa which after
preprocessing gives this http://imgur.com/JzrDkug . I wanted to know if
there is some way to improve in post-processing phase. Right now I am using
regex matching to filter the noise but it doesn't work in all cases. For eg:
"does‘?", "That's‘his." , "their’" are some words which may not be
considered fully as noise but they get filtered out after regex matching.
Also, Is there any way to retrain tesseract for improving results in such
cases? Any feedback mechanism which can help improve?
On Tuesday, July 1, 2014 8:52:35 PM UTC+5:30, Nick White wrote:
>
> Hi Meenal,
>
> On Tue, Jul 01, 2014 at 02:04:36AM -0700, Meenal Goyal wrote:
> > When I try to ocr an image, it also produces some noise apart from the
> > meaningful words. An example output for an image is:
> >
> > All women become
> >
> > like their’ mqthers. _ ' 1"’ '
> >
> > - —T at-{rs their tragedy. ” "R"-‘»“T‘*'-.
> > ‘ .
> >
> > /
> >
> >
> >
> > N man does“
> >
> > That's‘his. ‘ '
> >
> > os'cAR»w;L'15E ‘ 9
> >
> > So, I wanted something which removes the noise in the text or at least
> reduce
> > it and produce correct output.
>
> I see. The best plan would be to preprocess the image to clean it
> up, so that Tesseract isn't seeing all that noise in the first
> place. Check out this wiki page:
> https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
>
> If you want to send a specific example image to the mailing list, we
> can try to offer more specific advice.
>
> Nick
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/bcaac70d-0459-4783-9b4b-86934eb003b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.