How can I set a blacklist to text2image? -c tessedit_char_blacklist=fifl doesn't work for me.
My problem is that text2image writes things like this: fl 133 162 159 199 5 I tried with --ligatures=true but the result is this one: fl 133 162 159 199 5 I'll continue with my research... El domingo, 4 de septiembre de 2016, 22:19:34 (UTC+2), fuzzy7k escribió: > > My earlier successes were definitely font related. Use a blacklist, or > whitelist.... > > -c tessedit_char_blacklist=fifl > > https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/discussion > > On Saturday, September 3, 2016 at 1:45:21 PM UTC-4, fuzzy7k wrote: >> >> It's a language thing: https://en.wikipedia.org/wiki/Typographic_ligature >> >> Try specifying a specific language? >> >> This parameter seems like a possible association (due to the description >> containing glyph): >> segment_penalty_dict_nonword 1.25 Score multiplier for glyph >> fragment segmentations which do not match a dictionary word (lower is >> better). >> >> Let me know what you find. I had this occur recently but have been >> chasing other issues and haven't verified a solution. >> >> >> On Saturday, September 3, 2016 at 5:23:55 AM UTC-4, Brais Gabín Moreira >> wrote: >>> >>> Hi, I'm trying to train tesseract. But text2image creates a single box >>> for 'fi' or 'fl'. Why it thinks that 'fi' or 'fl' are a single character >>> instead of two? How can I fix this? >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/14f3f358-15a0-4498-a3ba-cfaede57e717%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.