I don't know about the specific app you mentioned but if you use Tesseract directly you can simply black-list this unicode character and that has a decent chance to get Tesseract to do the right thing here. This character is called a ligature and while you are at it you could black-list also:
// U+FB00 ff ef ac 80 LATIN SMALL LIGATURE FF // U+FB02 fl ef ac 82 LATIN SMALL LIGATURE FL Patrick On Monday, April 8, 2013 1:18:23 PM UTC-4, [email protected] wrote: > > Does anybody know about the english language pack for tesseract? > > I'm using the OCR from the Software995 folks, and when I have an "fl" > (lower case "FL"), it recognizes now as a single character > > (hex #xfb01). Is there a way to make it come out as two characters? > > What's strange is that I think it used to work, but without changing > anything, it seems to work differently now (famous last words :-). > > Thanks! > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

