I'm a newbie to tesseract and hoping that someone can help.  I'd like
to convert a screen capture image to text.  Here's the steps that I'm
taking:

* create screen capture
* crop the image so that only the text is visible, with a white
background
* upscale the image to 300dpi
* convert to 8-bit tiff
* process using "tesseract output.tif text-output -l eng"

After getting poor results, I started bumping up the font size before
taking the screen capture.  However, this hasn't helped.  Currently,
I'm working from a 300dpi, high resolution image where the lower case
characters are about 95 pixels high.  You can see the image at the
following link (http://www.bryanpayne.org/tmp/output.tif), note that
it is 9.5 MB.

The results that I'm getting for this image look like this:

'I`11is is El, te
vv<>11ciI @r if 1
vvill  I;>€ 2LI;‘>14
];)I°()];)€I`1}7-
—]irr1

Clearly not very good.  So my question is, what am I doing wrong?  It
seems that the source image is about as ideal as it gets.  Yet,
tesseract is having a lot of trouble with it.  I'm suspecting user
error, so I'm hoping that someone can point me in the right
direction.  Thanks!
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to