Hi Sven,

Repeating the char four times and using single_word segmentation seems to 
work. I'll try on more fonts.

Thanks for your help,

-- 
Yann

On Tuesday, April 9, 2013 6:33:10 PM UTC+2, sventech wrote:
>
> Hi Yann,
> I would not try to train all the system fonts -- just check if you find 
> some that are not recognized, then train for them. You could try a larger 
> border, or repeating characters and then post-processing, but the first 
> thing to try is page segmentation mode 8 (treat the image as a single word).
>
> I also noticed this Stackoverflow discussion which might be of use to you:
>
> http://stackoverflow.com/questions/1708858/automatic-font-recognition-with-python
>
> --Sven
>
> On Tue, Apr 9, 2013 at 9:57 AM, Yann ROBIN <[email protected] <javascript:>
> > wrote:
>
>> Hi,
>>
>> I'm trying to use tesseract to check that font glyph match character 
>> (using pyton).
>> To do so I use freetype, load the font, print the glyph in a bitmap and 
>> send it to tesseract.
>>
>> Looking at previous post I've setup tesseract like this :
>>
>> api.Init(".","eng",tesseract.OEM_DEFAULT)
>> api.SetVariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyz");
>> api.SetVariable("textord_noise_area_ratio", "1.0")
>> api.SetPageSegMode(tesseract.PSM_SINGLE_CHAR)
>>
>>
>> I render the character at 24px, add a 5px white border around and send it to 
>> tesseract.
>>
>> For the font that I'm testing the character g,j,w is not recognized, when I 
>> allow capital letters, n is seen as H.
>>
>>
>> As I am using freetype and printing the character, I think I might do things 
>> to help tesseract, like giving the baseline and the face bounding box.
>>
>> But I don't know who to do that (I don't know if I can), and I don't even 
>> know if it will be better.
>>
>>
>> But maybe the real solution is to get tesseract trained over all my system 
>> fonts ? (Currently i'm using the default english training files)
>>
>>
>>
>> What would you do ?
>>
>>
>> -- 
>>
>> Yann
>>
>>  -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>
>
> -- 
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.” 
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to