I'm working on a project where my source tiff image may have background colours or images behind the text.
I've been able to train tesseract successfully with some other fonts, which works very well, but the background does seem to confuse tesseract a little. My question is, does tesseract perform any image pre-processing? If not, is it worth me trying to apply a threshold or some other type of optimization to the image first? I've had a brief look through the source code, but I'm not really a C+ + developer so it was a bit hard to follow. What I'm trying to achieve is something like reading text from a magazine where it's all printed on top of a background image. I'm trying to find out what sort of image tesseract is actually working on, as perhaps I could then train it with a more accurate representation of the text it needs to recognise. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

