I have been messing around with Tesseract 3.00 for the past couple of days and have tried a few different approaches to training/image processing, none of which are really working. I am using the Pocket- OCR ( https://github.com/rcarlsen/Pocket-OCR ) app for iPhone to do the testing, but am training on OSX.
A sample image that I need to scan is here: http://cloud.coneybeare.net/8FFB ( I only need the bottom 12 tiles ) Running this untrained on english obviously comes up as garbage: http://cloud.coneybeare.net/8Fbt I am just unclear as what I need to do exactly to train tesseract to detect these. I have gone through the training, made traineddata files and am familiar with the way to do it, but I must have my training strategy all wrong. I have tried many different training images ( http://cloud.coneybeare.net/8FKj ) but I just can't get results. Is it best to create a new font, with each "tile" representing a new letter? Of is it best to do some fancy image processing and cropping tesseract scan? Is there any optimizations I can do if I know I am only dealing with uppercase letters, and no words, numbers or punctuation? What should I do to reliably train tesseract for detecting the tiles in this image? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

