Re: Accuracy problems : alpha and numeric characters getting switched around

Sriranga(78yrsold) Tue, 23 Oct 2012 23:09:46 -0700

pleas see attached files - output with reference image file is in order.
Could not udnerstand your purpose/intention?


On Wed, Oct 24, 2012 at 9:52 AM, Gaara Sabaku <
[email protected]> wrote:

> your going to think i am crazy but listen, stack the dificult word 10
> times into one image and observe the output
>
> like this:
> Kage Gaara
> Kage Gaara
> Kage Gaara
> Kage Gaara
> Kage Gaara
> Kage Gaara
> Kage Gaara
> Kage Gaara
> Kage Gaara
> Kage Gaara
>
> On Tue, Oct 23, 2012 at 7:00 PM, Ryan <[email protected]> wrote:
>
>> Hi, I am using tesseract to generate unicode mappings for 'corrupt' font
>> files. While I have complete control over rendering of the characters
>> (size, positioning, colors) I am having troubles with accuracy. Mainly
>> tesseract seems to like numbers over letters. In particular, lower case
>> 'l's often get detected as vertical bars or ones. Also, latin 'o's and
>> zero's get switched around.
>>
>> For example, the attached png has the text "ByJamesMorApil20" but after
>> running the following code I get "ByJamesM0rApi12O" as the result from
>> GetUTF8Text.
>>
>> Notice that the lower case 'o' became a zero, the zero became an upper
>> case 'o', and lower case 'l' became a one.
>>
>> TessBaseAPI api;
>>> api.SetPageSegMode(PSM_SINGLE_LINE);
>>> api.Init("path_to_trained_files", NULL);
>>> api.SetImage((const unsigned char*)bmp, width, height, bpp, stride)
>>> std::string ocr_results( api->GetUTF8Text() );
>>
>>
>> I have complete control over how the characters and the image are
>> rendered (any size, spacing, colors, dpi), but I am still unable to get any
>> better accuracy than this so far.
>>
>> The only restriction is that the input characters are never going to be
>> 'real' words or sentences, just random order.
>>
>> I originally tried PSM_SINGLE_CHAR mode, but that caused a lot more
>> errors, mainly with capitalization.
>>
>> Any help on increasing accuracy would be appreciated!
>>
>> Thanks
>>
>>  --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Kage Gaara
Kage Gaara
Kage Gaara
Kage Gaara
Kage Gaara
Kage Gaara
Kage Gaara
Kage Gaara
Kage Gaara
Kage Gaara

<<attachment: sample_eng.tif>>

sample_eng.box
Description: Binary data

Re: Accuracy problems : alpha and numeric characters getting switched around

Reply via email to