Yes, I'm doing something similar in python. Do you know of a list of a ligatures so I can convert them to ascii? I know fi and fl are the most popular, but there are probably many more.
Michael Sander [email protected] 607-227-9859 On Mon, Apr 29, 2013 at 7:48 PM, Greg Dunkel <[email protected]> wrote: > I couldn't get the config to work on Ubuntu so I wrote a post-processing > sed script to convert the ligatures to two characters. > > > On Mon, Apr 29, 2013 at 3:45 AM, Michael Sander > <[email protected]>wrote: > >> How did you format your config file? I tried adding the following line >> and it doesn't seem to work: >> >> tessedit_char_blacklist fi >> >> >> On Sunday, April 1, 2012 5:16:59 AM UTC-4, klo wrote: >>> >>> Thanks. I added it to my tesseract configuration file and it works great >>> >>> Cheers >>> >>> >>> On Saturday, March 31, 2012 10:12:50 PM UTC+2, zdpo wrote: >>>> >>>> >>>> Dňa 31.03.2012 16:17, klo wrote / napísal(a): >>>> >>>> In my simple testing, I find this most common problem, is there a way to >>>> instruct tesseract not to use those glyphs without limiting it to ASCII? >>>> >>>> I use tesseract 3.01 BTW >>>> >>>> >>>> put them to blacklist with variable tessedit_char_blacklist (search >>>> forum if you do not know how). >>>> >>>> Zdenko >>>> >>>> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > > > -- > /greg > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/unsubscribe?hl=en > . > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

