Zdravko, You should do text-detection before passing images to Tesseract. Text-detection is a process of determining of image regions containing text. Even if an image contains no text, Tesseract anyways will treat it as an image of text.
Before recognition Tess applies a so-called binarization algorithm, which converts an RGB image to monochrome one (black for text and white for background). For your sample image the Otsu binarization used in Tesseract (http://en.wikipedia.org/wiki/Otsu%27s_method) would certainly give a number of skewed vertical lines resembling backslashes and further recognition classifies them as such. "textord_heavy_nr" and some other variables control size-based noise removal but work satisfactory only in case when there's a significant body of good text surrounded but some amount of noise. In your image everything is noise, so it won't work. Therefore you need to extend your pre-processing in order to feed Tess with images indeed containing text. Decisions can be made based on contrast estimation, distinctive color distribution, etc. HTH Warm regards, Dmitry Silaev On Fri, Mar 4, 2011 at 5:25 PM, zdravco <[email protected]> wrote: > Hello, > > I am using tesseract in my project after some image pre-processing. > There are some false negatives I was hoping tesseract would eliminate > by producing no output. However, sometimes there is a strange output > that I get from almost blank images. > Here is the sample image: > https://picasaweb.google.com/zdravco/TesseractTest#5580227257541654274 > > When I run it with tesseract rev. 552 using English language I get: > " \\\\ R \." > > Does anyone know if there are some options in tesseract that could > eliminate this noise? Or maybe if I could improve my input image with > some further pre-processing. I have also tried to recompile tesseract > with "textord_heavy_nr" set to TRUE, but then the output is: > "an \\“ R \". > > Thanks, > Zdravko > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

