Hi Sven, Repeating the char four times and using single_word segmentation seems to work. I'll try on more fonts.
Thanks for your help, -- Yann On Tuesday, April 9, 2013 6:33:10 PM UTC+2, sventech wrote: > > Hi Yann, > I would not try to train all the system fonts -- just check if you find > some that are not recognized, then train for them. You could try a larger > border, or repeating characters and then post-processing, but the first > thing to try is page segmentation mode 8 (treat the image as a single word). > > I also noticed this Stackoverflow discussion which might be of use to you: > > http://stackoverflow.com/questions/1708858/automatic-font-recognition-with-python > > --Sven > > On Tue, Apr 9, 2013 at 9:57 AM, Yann ROBIN <[email protected] <javascript:> > > wrote: > >> Hi, >> >> I'm trying to use tesseract to check that font glyph match character >> (using pyton). >> To do so I use freetype, load the font, print the glyph in a bitmap and >> send it to tesseract. >> >> Looking at previous post I've setup tesseract like this : >> >> api.Init(".","eng",tesseract.OEM_DEFAULT) >> api.SetVariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyz"); >> api.SetVariable("textord_noise_area_ratio", "1.0") >> api.SetPageSegMode(tesseract.PSM_SINGLE_CHAR) >> >> >> I render the character at 24px, add a 5px white border around and send it to >> tesseract. >> >> For the font that I'm testing the character g,j,w is not recognized, when I >> allow capital letters, n is seen as H. >> >> >> As I am using freetype and printing the character, I think I might do things >> to help tesseract, like giving the baseline and the face bounding box. >> >> But I don't know who to do that (I don't know if I can), and I don't even >> know if it will be better. >> >> >> But maybe the real solution is to get tesseract trained over all my system >> fonts ? (Currently i'm using the default english training files) >> >> >> >> What would you do ? >> >> >> -- >> >> Yann >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

