You appear to be a fellow Ithacan! (I no longer live there, but remember it fondly.)
Anyway, other common ligatures include ff, ffi, ffl, fb, fy, ft http://ilovetypography.com/2007/09/09/decline-and-fall-of-the-ligature/ Sven On Monday, April 29, 2013, Michael Sander wrote: > Yes, I'm doing something similar in python. Do you know of a list of a > ligatures so I can convert them to ascii? I know fi and fl are the most > popular, but there are probably many more. > > > Michael Sander > [email protected] <javascript:_e({}, 'cvml', > '[email protected]');> > 607-227-9859 > > > On Mon, Apr 29, 2013 at 7:48 PM, Greg Dunkel > <[email protected]<javascript:_e({}, 'cvml', '[email protected]');> > > wrote: > >> I couldn't get the config to work on Ubuntu so I wrote a post-processing >> sed script to convert the ligatures to two characters. >> >> >> On Mon, Apr 29, 2013 at 3:45 AM, Michael Sander >> <[email protected]<javascript:_e({}, 'cvml', >> '[email protected]');> >> > wrote: >> >>> How did you format your config file? I tried adding the following line >>> and it doesn't seem to work: >>> >>> tessedit_char_blacklist fi >>> >>> >>> On Sunday, April 1, 2012 5:16:59 AM UTC-4, klo wrote: >>>> >>>> Thanks. I added it to my tesseract configuration file and it works great >>>> >>>> Cheers >>>> >>>> >>>> On Saturday, March 31, 2012 10:12:50 PM UTC+2, zdpo wrote: >>>>> >>>>> >>>>> Dňa 31.03.2012 16:17, klo wrote / napísal(a): >>>>> >>>>> In my simple testing, I find this most common problem, is there a way to >>>>> instruct tesseract not to use those glyphs without limiting it to ASCII? >>>>> >>>>> I use tesseract 3.01 BTW >>>>> >>>>> >>>>> put them to blacklist with variable tessedit_char_blacklist (search >>>>> forum if you do not know how). >>>>> >>>>> Zdenko >>>>> >>>>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to >>> [email protected]<javascript:_e({}, 'cvml', >>> '[email protected]');> >>> To unsubscribe from this group, send email to >>> [email protected] <javascript:_e({}, 'cvml', >>> 'tesseract-ocr%[email protected]');> >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]<javascript:_e({}, >>> 'cvml', 'tesseract-ocr%[email protected]');> >>> . >>> >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >>> >> >> >> >> -- >> /greg >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to >> [email protected]<javascript:_e({}, 'cvml', >> '[email protected]');> >> To unsubscribe from this group, send email to >> [email protected] <javascript:_e({}, 'cvml', >> 'tesseract-ocr%[email protected]');> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "tesseract-ocr" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/unsubscribe?hl=en >> . >> To unsubscribe from this group and all its topics, send an email to >> [email protected] <javascript:_e({}, 'cvml', >> 'tesseract-ocr%[email protected]');>. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to > [email protected]<javascript:_e({}, 'cvml', > '[email protected]');> > To unsubscribe from this group, send email to > [email protected] <javascript:_e({}, 'cvml', > 'tesseract-ocr%[email protected]');> > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:_e({}, > 'cvml', 'tesseract-ocr%[email protected]');>. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

