Re: Eng lang pack translating lower-case "FL" as a single character

Patrick Questembert Mon, 08 Apr 2013 21:49:59 -0700

I don't know about the specific app you mentioned but if you use Tesseract 
directly you can simply black-list this unicode character and that has a 
decent chance to get Tesseract to do the right thing here. This character 
is called a ligature and while you are at it you could black-list also:


// U+FB00 ﬀ ef ac 80 LATIN SMALL LIGATURE FF

// U+FB02 ﬂ ef ac 82 LATIN SMALL LIGATURE FL

Patrick


On Monday, April 8, 2013 1:18:23 PM UTC-4, [email protected] wrote:
>
> Does anybody know about the english language pack for tesseract?
>
> I'm using the OCR from the Software995 folks, and when I have an "fl" 
> (lower case "FL"), it recognizes now as a single character 
>
> (hex #xfb01). Is there a way to make it come out as two characters?
>
> What's strange is that I think it used to work, but without changing 
> anything, it seems to work differently now (famous last words :-).
>
> Thanks!
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Eng lang pack translating lower-case "FL" as a single character

Reply via email to