On Monday, April 29, 2013 8:39:57 PM UTC-4, Michael Sander wrote: > Yes, I'm doing something similar in python. Do you know of a list of a > ligatures so I can convert them to ascii? I know fi and fl are the most > popular, but there are probably many more. > > The list of Unicode ligatures is here: http://www.unicode.org/charts/PDF/UFB00.pdf
Go Big Red! > > Michael Sander > michael...@gmail.com <javascript:> > 607-227-9859 > > > On Mon, Apr 29, 2013 at 7:48 PM, Greg Dunkel <drdun...@gmail.com<javascript:> > > wrote: > >> I couldn't get the config to work on Ubuntu so I wrote a post-processing >> sed script to convert the ligatures to two characters. >> >> >> On Mon, Apr 29, 2013 at 3:45 AM, Michael Sander >> <michael...@gmail.com<javascript:> >> > wrote: >> >>> How did you format your config file? I tried adding the following line >>> and it doesn't seem to work: >>> >>> tessedit_char_blacklist fi >>> >>> >>> On Sunday, April 1, 2012 5:16:59 AM UTC-4, klo wrote: >>>> >>>> Thanks. I added it to my tesseract configuration file and it works great >>>> >>>> Cheers >>>> >>>> >>>> On Saturday, March 31, 2012 10:12:50 PM UTC+2, zdpo wrote: >>>>> >>>>> >>>>> Dňa 31.03.2012 16:17, klo wrote / napísal(a): >>>>> >>>>> In my simple testing, I find this most common problem, is there a way to >>>>> instruct tesseract not to use those glyphs without limiting it to ASCII? >>>>> >>>>> I use tesseract 3.01 BTW >>>>> >>>>> >>>>> put them to blacklist with variable tessedit_char_blacklist (search >>>>> forum if you do not know how). >>>>> >>>>> Zdenko >>>>> >>>>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to tesser...@googlegroups.com<javascript:> >>> To unsubscribe from this group, send email to >>> tesseract-oc...@googlegroups.com <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com <javascript:>. >>> >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >>> >> >> >> >> -- >> /greg >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to tesser...@googlegroups.com<javascript:> >> To unsubscribe from this group, send email to >> tesseract-oc...@googlegroups.com <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "tesseract-ocr" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/unsubscribe?hl=en >> . >> To unsubscribe from this group and all its topics, send an email to >> tesseract-oc...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.