You can analyze the box data and easily detect that these letters overlap
there boxes.

On Wed, Oct 24, 2012 at 4:59 PM, Ryan <[email protected]> wrote:

> I am using GetUTF8() to get ocr results from attached image. As you can
> see from the image there is a ligature (0x00E6 æ) and tesseract-ocr returns
> the following results
>
> ():DFHJKLMTVabcdefghiklmnorstuvyaew
>
> Ideally tesseract would return 'æ' and not 'a' & 'e' separately, but since
> it doesn't, is there anyway to get from tesseract that 'a' and 'e' were
> connected?
>
> And as a more general question, how is tesseract with detecting ligatures?
> I'm using the latest traineddata files from tesseract.
>
> thanks
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to