[tesseract-ocr] Re: text2image creates char boxes for 'fi' and 'fl'. Why?

2016-09-08 Thread Brais Gabín Moreira
How can I set a blacklist to text2image? -c tessedit_char_blacklist=fifl doesn't work for me. My problem is that text2image writes things like this: fl 133 162 159 199 5 I tried with --ligatures=true but the result is this one: fl 133 162 159 199 5 I'll continue with my research... El domingo,

[tesseract-ocr] Re: text2image creates char boxes for 'fi' and 'fl'. Why?

2016-09-04 Thread fuzzy7k
My earlier successes were definitely font related. Use a blacklist, or whitelist -c tessedit_char_blacklist=fifl https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/discussion On Saturday, September 3, 2016 at 1:45:21 PM UTC-4, fuzzy7k wrote: > > It's a language thing:

[tesseract-ocr] Re: text2image creates char boxes for 'fi' and 'fl'. Why?

2016-09-03 Thread fuzzy7k
It's a language thing: https://en.wikipedia.org/wiki/Typographic_ligature Try specifying a specific language? This parameter seems like a possible association (due to the description containing glyph): segment_penalty_dict_nonword1.25Score multiplier for glyph fragment segmentations