> > I believe that tesseract operates on black and white > images. All grayscale and colour images are converted > internally to black and white if necessary. In your > case, you could probably do the conversion yourself, > turning every pixel that is not black to white, since > all of the text is black. > > Many people have converted numeric text, and there > are many posts in the archive about that. I think > some used a whitelist of numeric characters, and > others created dictionaries containing valid combinations > of numbers to search against. Tesseract does not > just try to recognize each character, it also tries > to recognize each "word" against dictionaries, so > it helps to let tesseract know that "8008" is a > better answer than "BOOB". > > Cheers, > Rob Komar >
ok, cool, very good to know. So what will try then is to make a target list of rooms that we want to find and feed this list as a 'numeric dictionary' into to Tesseract. We keep you updated on the results, somewhere next week. Thanks again, Rutger -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/917E9B58-35AB-4452-B278-7E9EC1484D7A%40gmail.com. For more options, visit https://groups.google.com/d/optout.

