Re: Different output for almost identical images

Rufus Sun, 08 Apr 2012 06:37:06 -0700

Thanks for all the responses. 
It worked for me with the following preprocessing: convert -border 500 
-resample 300 -density 300 -resize 300 bad2.tiff bad2.png


Actally, all the images are thresholded from one original image:
good.tiff has been thresholded manually with imagemagick at 40%
bad.tiff has been thresholded by OpenCV with the Otsu algorithm
bad2.tiff has been thresholded manually with imagemagick at 50%

Maybe I should calculate just the threshold with the Otsu algorithm and 
then use imagemagick to actually threshold the image. Imagemagick actually 
saves the image as black and white, whereas OpenCV stores it as grayscale. 
Maybe this is one of the many factors where I can improve the recognition. 
And then of course, put a border and resize the image as mentioned above.

Am Donnerstag, 5. April 2012 23:18:36 UTC+2 schrieb Rufus:
>
> Issue: 
> good.tiff and bad.tiff are almost identical. Infact, I've put the images 
> together in mix.jpg on top of each other to make this visible (red text is 
> from bad.jpg and the black text is copied from good.jpg)
> I fail to understand why tesseract fails in one case(bad.tiff) and 
> succeeds in the other(good.tiff), although the images are almost identical. 
> Is there something you could suggest to look for, or some hints from your 
> previous experience?
>
> Command used:
> tesseract bad.tiff bad -l eng -psm 8 nobatch digits
> tesseract good.tiff good -l eng -psm 8 nobatch digits
>
> My system:
> Ubuntu 11.10
> Tesseract 3.01
> Leptonica 1.68
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Different output for almost identical images

Reply via email to