Hi,
I'm trying to use tesseract to check that font glyph match character (using
pyton).
To do so I use freetype, load the font, print the glyph in a bitmap and
send it to tesseract.
Looking at previous post I've setup tesseract like this :
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyz");
api.SetVariable("textord_noise_area_ratio", "1.0")
api.SetPageSegMode(tesseract.PSM_SINGLE_CHAR)
I render the character at 24px, add a 5px white border around and send it to
tesseract.
For the font that I'm testing the character g,j,w is not recognized, when I
allow capital letters, n is seen as H.
As I am using freetype and printing the character, I think I might do things to
help tesseract, like giving the baseline and the face bounding box.
But I don't know who to do that (I don't know if I can), and I don't even know
if it will be better.
But maybe the real solution is to get tesseract trained over all my system
fonts ? (Currently i'm using the default english training files)
What would you do ?
--
Yann
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.