I have had some progress with having tesseract recognize these small colored images (thanks to your suggestions to change dpi and greyscale image), but am still puzzled by different degree of accuracy I am getting.
Please see http://www.speedyshare.com/552226303.html for sample images. While the output text from sao-paulo-v-cruzeiro gimped.tif is legible, the one from west-ham-united-v-tottenham-hotspur-bw.tif is not at all. I used gimp to first scale images to 800x44px and 300dpi, than to either greyscale it or convert to 1-bit black and white palette. Am i doing something in the conversion wrong (later to be converted with javaio or imagemagick programmatically)? Also, though the first image in attachment was recognized, it has some mistakes (for instance, o->u). Could training help in this situation? Thanks. On 3 Jun., 12:45, paulfeakins <[email protected]> wrote: > You could try some thresholding / selective colour replacing so that > you have a black and white image? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

