Okay, I figured this out. It was indeed the small type causing the problem, but there is a way around the issue. In photoshop, I create the type sample at the small size I want to train at (11px Tahoma in this case). Then I flatten the image, and scale it to twice its original size, using nearest neighbor rescaling. I then train tesseract on this enlarged sample. Then, when I am actually reading the 11px TIFs that I want tesseract to read, I also re-scale them to twice their original size before feeding them into tesseract. Tesseract has been 100% accurate so far with this method.
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

