You can analyze the box data and easily detect that these letters overlap there boxes.
On Wed, Oct 24, 2012 at 4:59 PM, Ryan <[email protected]> wrote: > I am using GetUTF8() to get ocr results from attached image. As you can > see from the image there is a ligature (0x00E6 æ) and tesseract-ocr returns > the following results > > ():DFHJKLMTVabcdefghiklmnorstuvyaew > > Ideally tesseract would return 'æ' and not 'a' & 'e' separately, but since > it doesn't, is there anyway to get from tesseract that 'a' and 'e' were > connected? > > And as a more general question, how is tesseract with detecting ligatures? > I'm using the latest traineddata files from tesseract. > > thanks > > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

