Still not working. I tried attaching the config,, but it won't let me because it's binary.
I made a workaround by converting all instances of fi into fi in the output, but obviously it would be better to strip the unicode first in tesseract. On a related note, why is tesseract even generating these characters in the first place given the fact that I chose English as the training data? On Monday, April 29, 2013 9:21:16 AM UTC-4, klo wrote: > > Michael, > > for example add this line in your config file: > > tessedit_char_blacklist fifl > > I don't know how gmail with represent these characters, but make sure file > is in UTF8 I guess > > > On Mon, Apr 29, 2013 at 9:45 AM, Michael Sander > <[email protected]<javascript:> > > wrote: > >> How did you format your config file? I tried adding the following line >> and it doesn't seem to work: >> >> tessedit_char_blacklist fi >> >> >> On Sunday, April 1, 2012 5:16:59 AM UTC-4, klo wrote: >>> >>> Thanks. I added it to my tesseract configuration file and it works great >>> >>> Cheers >>> >>> >>> On Saturday, March 31, 2012 10:12:50 PM UTC+2, zdpo wrote: >>>> >>>> >>>> Dňa 31.03.2012 16:17, klo wrote / napísal(a): >>>> >>>> In my simple testing, I find this most common problem, is there a way to >>>> instruct tesseract not to use those glyphs without limiting it to ASCII? >>>> >>>> I use tesseract 3.01 BTW >>>> >>>> >>>> put them to blacklist with variable tessedit_char_blacklist (search >>>> forum if you do not know how). >>>> >>>> Zdenko >>>> >>>> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "tesseract-ocr" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/unsubscribe?hl=en >> . >> To unsubscribe from this group and all its topics, send an email to >> [email protected] <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

