I would like some advice concerning the general use of tesseract, because 
my experience with it tends to two extremes: either tesseract performs 
flawlessly, with no prior modification of the image necessary except 
cropping to the text and (most significant) enlarging the image by a factor 
of 2 or 4; or tesseract's output is riddled with errors.

Following advice to improve the quality of the image (Fred's textcleaner 
script, or applying the Imagemagick functions it uses individually), 
usually produces significant improvement in *human readability* of the 
image, but as regards tesseract they usually produce no improvement, and 
most often actual deterioration in its performance.

So I am looking for another reason to explain tesseract's difficulty with 
certain images. I thought perhaps its performance may be dependent on its 
trying to identify the particular font used, but 
https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf 
seems to say not. 

The only other possibility I can think of is either the size or the aspect 
ratio of the text in the image has been subtly deformed. If so, it is not 
apparent to my eye, but certainly tesseract is very sensitive to size 
change, because, when it works, resizing the image makes such a dramatic 
improvement.

Does anyone have other suggestions as to the nature of the problem? I'm not 
asking for detailed advice here, which is why I've given no image samples, 
but for general lines of attack, strategy rather than tactics. Thank you.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/15dcee7c-0815-47c3-9c74-29f8e90a7ca2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to