On Monday, April 29, 2013 8:39:57 PM UTC-4, Michael Sander wrote:

> Yes, I'm doing something similar in python. Do you know of a list of a 
> ligatures so I can convert them to ascii? I know fi and fl are the most 
> popular, but there are probably many more.
>
>
The list of Unicode ligatures is here: 
http://www.unicode.org/charts/PDF/UFB00.pdf

Go Big Red!
 

>
> Michael Sander
> michael...@gmail.com <javascript:>
> 607-227-9859
>
>
> On Mon, Apr 29, 2013 at 7:48 PM, Greg Dunkel <drdun...@gmail.com<javascript:>
> > wrote:
>
>> I couldn't get the config to work on Ubuntu so I wrote a post-processing 
>> sed script to convert the ligatures to two characters.
>>
>>
>> On Mon, Apr 29, 2013 at 3:45 AM, Michael Sander 
>> <michael...@gmail.com<javascript:>
>> > wrote:
>>
>>> How did you format your config file? I tried adding the following line 
>>> and it doesn't seem to work:
>>>
>>> tessedit_char_blacklist fi
>>>
>>>
>>> On Sunday, April 1, 2012 5:16:59 AM UTC-4, klo wrote:
>>>>
>>>> Thanks. I added it to my tesseract configuration file and it works great
>>>>
>>>> Cheers
>>>>
>>>>
>>>> On Saturday, March 31, 2012 10:12:50 PM UTC+2, zdpo wrote:
>>>>>
>>>>>  
>>>>> Dňa 31.03.2012 16:17, klo  wrote / napísal(a): 
>>>>>
>>>>> In my simple testing, I find this most common problem, is there a way to 
>>>>> instruct tesseract not to use those glyphs without limiting it to ASCII?
>>>>>
>>>>> I use tesseract 3.01 BTW
>>>>>
>>>>>
>>>>>  put them to blacklist with variable tessedit_char_blacklist (search 
>>>>> forum if you do not know how).
>>>>>
>>>>> Zdenko
>>>>>
>>>>>   -- 
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to tesser...@googlegroups.com<javascript:>
>>> To unsubscribe from this group, send email to
>>> tesseract-oc...@googlegroups.com <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>  
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com <javascript:>.
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>  
>>>  
>>>
>>
>>
>>
>> -- 
>> /greg
>>  
>> -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to tesser...@googlegroups.com<javascript:>
>> To unsubscribe from this group, send email to
>> tesseract-oc...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>  
>> --- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/unsubscribe?hl=en
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> tesseract-oc...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to