Indeed, for this very image it's easy: just run say Photoshop, crop to ROI and do research of how to mix color channels in order for the text to stand out clearly against the background. Then select a suitable threshold value, and you're done. Then you should have no difficulties to code that into your program. If you're lazy to code it yourself, try to google around for those keywords.
The problem arises when you wish to make this algorithm to be fully automated. All images you would pass to it can differ significantly in many aspects. Then those fixed channel percentages and thresholds won't suffice, you'll need to implement something more intelligent. Warm regards, Dmitri Silaev www.CustomOCR.com On Tue, Jun 21, 2011 at 10:24 AM, Felipe Leal Coutinho <[email protected]> wrote: > Hello, > > I'm try to use tesseract to make OCR of bank cheques captured from > digital cameras. As you can see (http://dl.dropbox.com/u/24085540/ > cheque-exemplo.jpg), these documents have a black text with a color > background (there isn't black color at the background). In order to > improve the results, I think that I will need to make some pre- > processing. Do you suggest something? I was thinking in remove the > background, but I didn't found any method to do that. > > Regards, > > Felipe. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

