Tuning for single character recognition

Yann ROBIN Tue, 09 Apr 2013 09:00:08 -0700

Hi,

I'm trying to use tesseract to check that font glyph match character (using 
pyton).
To do so I use freetype, load the font, print the glyph in a bitmap and 
send it to tesseract.


Looking at previous post I've setup tesseract like this :

api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyz");
api.SetVariable("textord_noise_area_ratio", "1.0")
api.SetPageSegMode(tesseract.PSM_SINGLE_CHAR)


I render the character at 24px, add a 5px white border around and send it to 
tesseract.

For the font that I'm testing the character g,j,w is not recognized, when I 
allow capital letters, n is seen as H.


As I am using freetype and printing the character, I think I might do things to 
help tesseract, like giving the baseline and the face bounding box.

But I don't know who to do that (I don't know if I can), and I don't even know 
if it will be better.


But maybe the real solution is to get tesseract trained over all my system 
fonts ? (Currently i'm using the default english training files)



What would you do ?


-- 

Yann

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Tuning for single character recognition

Reply via email to