I have been messing around with Tesseract 3.00 for the past couple of
days and have tried a few different approaches to training/image
processing, none of which are really working. I am using the Pocket-
OCR ( https://github.com/rcarlsen/Pocket-OCR ) app for iPhone to do
the testing, but am training on OSX.

A sample image that I need to scan is here: http://cloud.coneybeare.net/8FFB
( I only need the bottom 12 tiles )

Running this untrained on english obviously comes up as garbage:
http://cloud.coneybeare.net/8Fbt

I am just unclear as what I need to do exactly to train tesseract to
detect these. I have gone through the training, made traineddata files
and am familiar with the way to do it, but I must have my training
strategy all wrong.  I have tried many different training images
( http://cloud.coneybeare.net/8FKj ) but I just can't get results. Is
it best to create a new font, with each "tile" representing a new
letter? Of is it best to do some fancy image processing and cropping
tesseract scan? Is there any optimizations I can do if I know I am
only dealing with uppercase letters, and no words, numbers or
punctuation? What should I do to reliably train tesseract for
detecting the tiles in this image?

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to