Am Donnerstag, 4. Oktober 2012 17:37:27 UTC+2 schrieb Francisco Loché Costa: > > Before the grayscale processing and after the threshold try to dilate and > erode the image, in this way you can fill the white spaces inside the > characters. Dilate can expand the black pixels, inside and outside the > characters outline. Erode will made the opposite operation, but if the > inside is filled with black, it will continue black, smoothing the outside > of the outline. Try also with images with more pixels if you experiment > problems in this two operations. > > dialate/erode did kill the image if used after graysacle, after treshold or without any modifications. i used a recent gimp. ive looked at the gimp filters egde detect(tried all 4), then sharpen - got worse
> If you find that tesseract doesn't recognizes most characters, you may > need to train the font, like for a new language. But i think the key is the > preprocessing. If dilate and erode don't work for you, try to find another > image transformation that helps, there are many that may be useful for you > (and many that i don't know yet... sorry) > > 2012/10/4 [email protected] <javascript:> <[email protected] <javascript:>> > >> hi, >> >> >> i would like to recognize a costum font with tesseract, ive played >> around with the screens below but did not get anything besides some >> chars that were recognized. >> any idea howto get the data from pictures like these? >> >> heres the source material: >> http://dmk-crew.dyndns.info/files/bf2-a-z.jpg >> >> and here with some modifications >> http://dmk-crew.dyndns.info/files/bf2-a-z-grayscale.jpg >> dmk-crew.dyndns.info/files/bf2-a-z-threshold.jpg >> >> is the train option maybe the way to go? >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > > > -- > * Francisco Loché Costa,* > * Ingeniero Técnico de Telecomunicación, esp. Telemática.* > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

