Re: noise output

Saurabh Gandhi Fri, 04 Mar 2011 22:24:19 -0800

Thanks for the prompt response. Will work on these and get back with more
specific doubts.


--
Regards,
Saurabh Gandhi




On Sat, Mar 5, 2011 at 11:52 AM, Dmitry Silaev <[email protected]>wrote:

> There are tons of. And I believe, no ready recipe can be used
> universally, this is very task-specific, especially in photographic
> images. Also I believe, to do good text detection your algo should in
> some extent mimic human behavior so it probably should be multi-stage,
> gradually refining results at every stage. Don't account on getting a
> working code snippet from the internet, most likely you'd have to
> write the code yourself.
>
> Some articles I had picked out when I was self-studying this field of
> document image processing. For the moment, there might be newer ones,
> but these can provide you with the basis. Apologies, I've no time to
> provide you with direct references and author names - I only listed my
> file system directory on this topic. You can Google for exact article
> titles to find links.
>
> 1990 Scale-Space and Edge Detection Using Anisotropic Diffusion.pdf
> 1998 Edge detection and ridge detection with automatic scale
>        selection.pdf
> 2001 Edge-Based Method for Text Detection from Complex Document
>        Images.pdf
> 2001 TEXT EXTRACTION FROM GREY SCALE PAGE IMAGES BY SIMPLE EDGE
>        DETECTORS.pdf
> 2002 Gaussian-Based Edge-Detection Methods - A Survey.pdf
> 2003 Fast Computation of Scale Normalised Gaussian Receptive
>        Fields.pdf
> 2003 Real-time scale selection in hybrid multi-scale
>        representations.pdf
> 2003 Recognition of text in 3-D scenes.pdf
> 2004 A method for ridge extraction.pdf
> 2004 A Review of Vessel Extraction Techniques and Algorithms.pdf
> 2004 Distinctive Image Features from Scale-Invariant Keypoints.pdf
> 2004 Scene Text Extraction in Natural Scene Images using
>        Hierarchical Feature Combining and Verification.PDF
> 2004 Text Detection from Natural Scene Images - Towards a System
>        for Visually Impaired Persons.PDF
> 2005 A novel approach for text detection in images using structural
>        features.pdf
> 2005 Color Text Extraction from Camera-based Images - the Impact of
>        the Choice of the Clustering Distance.PDF
> 2005 Improved Text-Detection Methods for a Camera-based Text
>        Reading System for Blind Persons.PDF
> 2005 Text Extraction from Gray Scale Historical Document Images
>        Using Adaptive Local Connectivity Map.pdf
> 2006 Multiscale Edge-Based Text Extraction from Complex Images.PDF
> 2006 Spatial and Color Spaces Combination for Natural Scene Text
>        Extraction.PDF
> 2008 A double-threshold image binarization method based on edge
>        detector.PDF
>
> HTH
>
> Warm regards,
> Dmitry Silaev
>
>
>
>
>
> On Sat, Mar 5, 2011 at 8:56 AM, Saurabh Gandhi <[email protected]>
> wrote:
> > Hey,
> > Any algorithm / whitepaper suggestions for text extraction, especially if
> > the text is not over-lay text but a part of the image itself. Most
> > algorithms I saw on the internet are compute intensive.
> >
> > --
> > Regards,
> > Saurabh Gandhi
> >
> >
> >
> >
> > On Sat, Mar 5, 2011 at 11:20 AM, Dmitry Silaev <[email protected]>
> > wrote:
> >>
> >> Zdravko,
> >>
> >> You should do text-detection before passing images to Tesseract.
> >> Text-detection is a process of determining of image regions containing
> >> text. Even if an image contains no text, Tesseract anyways will treat
> >> it as an image of text.
> >>
> >> Before recognition Tess applies a so-called binarization algorithm,
> >> which converts an RGB image to monochrome one (black for text and
> >> white for background). For your sample image the Otsu binarization
> >> used in Tesseract (http://en.wikipedia.org/wiki/Otsu%27s_method) would
> >> certainly give a number of skewed vertical lines resembling
> >> backslashes and further recognition classifies them as such.
> >>
> >> "textord_heavy_nr" and some other variables control size-based noise
> >> removal but work satisfactory only in case when there's a significant
> >> body of good text surrounded but some amount of noise. In your image
> >> everything is noise, so it won't work.
> >>
> >> Therefore you need to extend your pre-processing in order to feed Tess
> >> with images indeed containing text. Decisions can be made based on
> >> contrast estimation, distinctive color distribution, etc.
> >>
> >> HTH
> >>
> >> Warm regards,
> >> Dmitry Silaev
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Mar 4, 2011 at 5:25 PM, zdravco <[email protected]> wrote:
> >> > Hello,
> >> >
> >> > I am using tesseract in my project after some image pre-processing.
> >> > There are some false negatives I was hoping tesseract would eliminate
> >> > by producing no output. However, sometimes there is a strange output
> >> > that I get from almost blank images.
> >> > Here is the sample image:
> >> >
> https://picasaweb.google.com/zdravco/TesseractTest#5580227257541654274
> >> >
> >> > When I run it with tesseract rev. 552 using English language I get:
> >> > " \\\\ R \."
> >> >
> >> > Does anyone know if there are some options in tesseract that could
> >> > eliminate this noise? Or maybe if I could improve my input image with
> >> > some further pre-processing. I have also tried to recompile tesseract
> >> > with "textord_heavy_nr" set to TRUE, but then the output is:
> >> > "an \\“ R \".
> >> >
> >> > Thanks,
> >> > Zdravko
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> > Groups "tesseract-ocr" group.
> >> > To post to this group, send email to [email protected].
> >> > To unsubscribe from this group, send email to
> >> > [email protected].
> >> > For more options, visit this group at
> >> > http://groups.google.com/group/tesseract-ocr?hl=en.
> >> >
> >> >
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "tesseract-ocr" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> >> [email protected].
> >> For more options, visit this group at
> >> http://groups.google.com/group/tesseract-ocr?hl=en.
> >>
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: noise output

Reply via email to