Please refer to the threads: https://groups.google.com/d/topic/tesseract-ocr/rcsvxsxdjNY/discussion https://groups.google.com/d/topic/tesseract-ocr/gh-bficm_2w/discussion
In brief, you'll need to write your own and quite intelligent segmentation for formulas, then you'll be able to use Tesseract as a "glyph recognizer". Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Jun 22, 2011 at 2:40 PM, Gökhan Sever <[email protected]> wrote: > Hello, > > I get this failure when I try to recognize a page which contains both > regular English text and formulations (e.g. Greek letters, divisions, > sub-super scripts etc..) > > > [gsever@ccn ~]$ tesseract scanpage1.tif outputtext > Tesseract Open Source OCR Engine v3.01 with Leptonica > tesseract: intmatcher.cpp:1165: int > IntegerMatcher::FindBestMatch(INT_CLASS_STRUCT*, const > ScratchEvidence&, uinT16, uinT8, INT_RESULT_STRUCT*): Assertion > `ClassTemplate->NumConfigs > 0' failed. > Aborted (core dumped) > > Is there a trained dataset for covering cases like mine? > > Thanks. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

