Hi Yann, I would not try to train all the system fonts -- just check if you find some that are not recognized, then train for them. You could try a larger border, or repeating characters and then post-processing, but the first thing to try is page segmentation mode 8 (treat the image as a single word).
I also noticed this Stackoverflow discussion which might be of use to you: http://stackoverflow.com/questions/1708858/automatic-font-recognition-with-python --Sven On Tue, Apr 9, 2013 at 9:57 AM, Yann ROBIN <[email protected]> wrote: > Hi, > > I'm trying to use tesseract to check that font glyph match character > (using pyton). > To do so I use freetype, load the font, print the glyph in a bitmap and > send it to tesseract. > > Looking at previous post I've setup tesseract like this : > > api.Init(".","eng",tesseract.OEM_DEFAULT) > api.SetVariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyz"); > api.SetVariable("textord_noise_area_ratio", "1.0") > api.SetPageSegMode(tesseract.PSM_SINGLE_CHAR) > > > I render the character at 24px, add a 5px white border around and send it to > tesseract. > > For the font that I'm testing the character g,j,w is not recognized, when I > allow capital letters, n is seen as H. > > > As I am using freetype and printing the character, I think I might do things > to help tesseract, like giving the baseline and the face bounding box. > > But I don't know who to do that (I don't know if I can), and I don't even > know if it will be better. > > > But maybe the real solution is to get tesseract trained over all my system > fonts ? (Currently i'm using the default english training files) > > > > What would you do ? > > > -- > > Yann > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

