I'm a newbie to tesseract and hoping that someone can help. I'd like to convert a screen capture image to text. Here's the steps that I'm taking:
* create screen capture * crop the image so that only the text is visible, with a white background * upscale the image to 300dpi * convert to 8-bit tiff * process using "tesseract output.tif text-output -l eng" After getting poor results, I started bumping up the font size before taking the screen capture. However, this hasn't helped. Currently, I'm working from a 300dpi, high resolution image where the lower case characters are about 95 pixels high. You can see the image at the following link (http://www.bryanpayne.org/tmp/output.tif), note that it is 9.5 MB. The results that I'm getting for this image look like this: 'I`11is is El, te vv<>11ciI @r if 1 vvill I;>€ 2LI;‘>14 ];)I°()];)€I`1}7- —]irr1 Clearly not very good. So my question is, what am I doing wrong? It seems that the source image is about as ideal as it gets. Yet, tesseract is having a lot of trouble with it. I'm suspecting user error, so I'm hoping that someone can point me in the right direction. Thanks! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

