Of course, these characters can get split or misrecognized given all that "holes" in your characters that mislead Tesseract. Preprocess. Use blur/threshold or morphology.
No way to turn off splitting or misrecognition as both are intrinsic features of Tesseract )) At the same time, any preprocessing is a kind of custom programming. Warm regards, Dmitri Silaev www.CustomOCR.com On Mon, Sep 26, 2011 at 5:55 PM, iqyush <[email protected]> wrote: > Hi all: > The attached BMP should be recognized by tesseract OCR 3.0.1 as > TULOMSAS 11-2-06377 MG 11 09 ER9 T 018 > but it's recognized as > TU2LOMSAS 11I-2-06377 MG 11 09 ER9 T 018 > I think the char "L" is splitted into and recognized as 2L; > and the second char "1" is splitted into and reognized as "1I". > > In my application the font size at a line is same, so OCR script feature is > not needed, > so can any one tell me how to disable the multi split feature from configure > file? > > B.R. Aiqing Yu > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

