Thanks for the prompt response. Will work on these and get back with more specific doubts.
-- Regards, Saurabh Gandhi On Sat, Mar 5, 2011 at 11:52 AM, Dmitry Silaev <[email protected]>wrote: > There are tons of. And I believe, no ready recipe can be used > universally, this is very task-specific, especially in photographic > images. Also I believe, to do good text detection your algo should in > some extent mimic human behavior so it probably should be multi-stage, > gradually refining results at every stage. Don't account on getting a > working code snippet from the internet, most likely you'd have to > write the code yourself. > > Some articles I had picked out when I was self-studying this field of > document image processing. For the moment, there might be newer ones, > but these can provide you with the basis. Apologies, I've no time to > provide you with direct references and author names - I only listed my > file system directory on this topic. You can Google for exact article > titles to find links. > > 1990 Scale-Space and Edge Detection Using Anisotropic Diffusion.pdf > 1998 Edge detection and ridge detection with automatic scale > selection.pdf > 2001 Edge-Based Method for Text Detection from Complex Document > Images.pdf > 2001 TEXT EXTRACTION FROM GREY SCALE PAGE IMAGES BY SIMPLE EDGE > DETECTORS.pdf > 2002 Gaussian-Based Edge-Detection Methods - A Survey.pdf > 2003 Fast Computation of Scale Normalised Gaussian Receptive > Fields.pdf > 2003 Real-time scale selection in hybrid multi-scale > representations.pdf > 2003 Recognition of text in 3-D scenes.pdf > 2004 A method for ridge extraction.pdf > 2004 A Review of Vessel Extraction Techniques and Algorithms.pdf > 2004 Distinctive Image Features from Scale-Invariant Keypoints.pdf > 2004 Scene Text Extraction in Natural Scene Images using > Hierarchical Feature Combining and Verification.PDF > 2004 Text Detection from Natural Scene Images - Towards a System > for Visually Impaired Persons.PDF > 2005 A novel approach for text detection in images using structural > features.pdf > 2005 Color Text Extraction from Camera-based Images - the Impact of > the Choice of the Clustering Distance.PDF > 2005 Improved Text-Detection Methods for a Camera-based Text > Reading System for Blind Persons.PDF > 2005 Text Extraction from Gray Scale Historical Document Images > Using Adaptive Local Connectivity Map.pdf > 2006 Multiscale Edge-Based Text Extraction from Complex Images.PDF > 2006 Spatial and Color Spaces Combination for Natural Scene Text > Extraction.PDF > 2008 A double-threshold image binarization method based on edge > detector.PDF > > HTH > > Warm regards, > Dmitry Silaev > > > > > > On Sat, Mar 5, 2011 at 8:56 AM, Saurabh Gandhi <[email protected]> > wrote: > > Hey, > > Any algorithm / whitepaper suggestions for text extraction, especially if > > the text is not over-lay text but a part of the image itself. Most > > algorithms I saw on the internet are compute intensive. > > > > -- > > Regards, > > Saurabh Gandhi > > > > > > > > > > On Sat, Mar 5, 2011 at 11:20 AM, Dmitry Silaev <[email protected]> > > wrote: > >> > >> Zdravko, > >> > >> You should do text-detection before passing images to Tesseract. > >> Text-detection is a process of determining of image regions containing > >> text. Even if an image contains no text, Tesseract anyways will treat > >> it as an image of text. > >> > >> Before recognition Tess applies a so-called binarization algorithm, > >> which converts an RGB image to monochrome one (black for text and > >> white for background). For your sample image the Otsu binarization > >> used in Tesseract (http://en.wikipedia.org/wiki/Otsu%27s_method) would > >> certainly give a number of skewed vertical lines resembling > >> backslashes and further recognition classifies them as such. > >> > >> "textord_heavy_nr" and some other variables control size-based noise > >> removal but work satisfactory only in case when there's a significant > >> body of good text surrounded but some amount of noise. In your image > >> everything is noise, so it won't work. > >> > >> Therefore you need to extend your pre-processing in order to feed Tess > >> with images indeed containing text. Decisions can be made based on > >> contrast estimation, distinctive color distribution, etc. > >> > >> HTH > >> > >> Warm regards, > >> Dmitry Silaev > >> > >> > >> > >> > >> > >> On Fri, Mar 4, 2011 at 5:25 PM, zdravco <[email protected]> wrote: > >> > Hello, > >> > > >> > I am using tesseract in my project after some image pre-processing. > >> > There are some false negatives I was hoping tesseract would eliminate > >> > by producing no output. However, sometimes there is a strange output > >> > that I get from almost blank images. > >> > Here is the sample image: > >> > > https://picasaweb.google.com/zdravco/TesseractTest#5580227257541654274 > >> > > >> > When I run it with tesseract rev. 552 using English language I get: > >> > " \\\\ R \." > >> > > >> > Does anyone know if there are some options in tesseract that could > >> > eliminate this noise? Or maybe if I could improve my input image with > >> > some further pre-processing. I have also tried to recompile tesseract > >> > with "textord_heavy_nr" set to TRUE, but then the output is: > >> > "an \\“ R \". > >> > > >> > Thanks, > >> > Zdravko > >> > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups "tesseract-ocr" group. > >> > To post to this group, send email to [email protected]. > >> > To unsubscribe from this group, send email to > >> > [email protected]. > >> > For more options, visit this group at > >> > http://groups.google.com/group/tesseract-ocr?hl=en. > >> > > >> > > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups > >> "tesseract-ocr" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > >> [email protected]. > >> For more options, visit this group at > >> http://groups.google.com/group/tesseract-ocr?hl=en. > >> > > > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

