pleas see attached files - output with reference image file is in order. Could not udnerstand your purpose/intention?
On Wed, Oct 24, 2012 at 9:52 AM, Gaara Sabaku < [email protected]> wrote: > your going to think i am crazy but listen, stack the dificult word 10 > times into one image and observe the output > > like this: > Kage Gaara > Kage Gaara > Kage Gaara > Kage Gaara > Kage Gaara > Kage Gaara > Kage Gaara > Kage Gaara > Kage Gaara > Kage Gaara > > On Tue, Oct 23, 2012 at 7:00 PM, Ryan <[email protected]> wrote: > >> Hi, I am using tesseract to generate unicode mappings for 'corrupt' font >> files. While I have complete control over rendering of the characters >> (size, positioning, colors) I am having troubles with accuracy. Mainly >> tesseract seems to like numbers over letters. In particular, lower case >> 'l's often get detected as vertical bars or ones. Also, latin 'o's and >> zero's get switched around. >> >> For example, the attached png has the text "ByJamesMorApil20" but after >> running the following code I get "ByJamesM0rApi12O" as the result from >> GetUTF8Text. >> >> Notice that the lower case 'o' became a zero, the zero became an upper >> case 'o', and lower case 'l' became a one. >> >> TessBaseAPI api; >>> api.SetPageSegMode(PSM_SINGLE_LINE); >>> api.Init("path_to_trained_files", NULL); >>> api.SetImage((const unsigned char*)bmp, width, height, bpp, stride) >>> std::string ocr_results( api->GetUTF8Text() ); >> >> >> I have complete control over how the characters and the image are >> rendered (any size, spacing, colors, dpi), but I am still unable to get any >> better accuracy than this so far. >> >> The only restriction is that the input characters are never going to be >> 'real' words or sentences, just random order. >> >> I originally tried PSM_SINGLE_CHAR mode, but that caused a lot more >> errors, mainly with capitalization. >> >> Any help on increasing accuracy would be appreciated! >> >> Thanks >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
Kage Gaara Kage Gaara Kage Gaara Kage Gaara Kage Gaara Kage Gaara Kage Gaara Kage Gaara Kage Gaara Kage Gaara
<<attachment: sample_eng.tif>>
sample_eng.box
Description: Binary data

