Re: problem by character recognition

denis56 Fri, 19 Jun 2009 03:30:43 -0700

I have had some progress with having tesseract recognize these small
colored images (thanks to your suggestions to change dpi and greyscale
image), but am still puzzled by different degree of accuracy I am
getting.

Please see http://www.speedyshare.com/552226303.html for sample
images. While the output text from sao-paulo-v-cruzeiro gimped.tif  is
legible, the one from west-ham-united-v-tottenham-hotspur-bw.tif  is
not at all.

I used gimp to first scale images to 800x44px and 300dpi, than to
either greyscale it or convert to 1-bit black and white palette. Am i
doing something in the conversion wrong (later to be converted with
javaio or imagemagick programmatically)?

Also, though the first image in attachment was recognized, it has some
mistakes (for instance, o->u). Could training help in this situation?

Thanks.

On 3 Jun., 12:45, paulfeakins <[email protected]> wrote:
> You could try some thresholding / selective colour replacing so that
> you have a black and white image?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: problem by character recognition

Reply via email to