Hi Yann,
I would not try to train all the system fonts -- just check if you find
some that are not recognized, then train for them. You could try a larger
border, or repeating characters and then post-processing, but the first
thing to try is page segmentation mode 8 (treat the image as a single word).

I also noticed this Stackoverflow discussion which might be of use to you:
http://stackoverflow.com/questions/1708858/automatic-font-recognition-with-python

--Sven

On Tue, Apr 9, 2013 at 9:57 AM, Yann ROBIN <[email protected]> wrote:

> Hi,
>
> I'm trying to use tesseract to check that font glyph match character
> (using pyton).
> To do so I use freetype, load the font, print the glyph in a bitmap and
> send it to tesseract.
>
> Looking at previous post I've setup tesseract like this :
>
> api.Init(".","eng",tesseract.OEM_DEFAULT)
> api.SetVariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyz");
> api.SetVariable("textord_noise_area_ratio", "1.0")
> api.SetPageSegMode(tesseract.PSM_SINGLE_CHAR)
>
>
> I render the character at 24px, add a 5px white border around and send it to 
> tesseract.
>
> For the font that I'm testing the character g,j,w is not recognized, when I 
> allow capital letters, n is seen as H.
>
>
> As I am using freetype and printing the character, I think I might do things 
> to help tesseract, like giving the baseline and the face bounding box.
>
> But I don't know who to do that (I don't know if I can), and I don't even 
> know if it will be better.
>
>
> But maybe the real solution is to get tesseract trained over all my system 
> fonts ? (Currently i'm using the default english training files)
>
>
>
> What would you do ?
>
>
> --
>
> Yann
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to